CHAPTER 2

INTERPRETING SCIENTIFIC EVIDENCE

Expert scientific evidence usually involves the forensic scientist making an observation on some aspect of the case and, based on past experience, reporting inferences to the Court.  For example, the scientist may compare blood found at the scene with that of the accused and declare it to be of the same type which may be quite rare in the general population. Our task is to see precisely what inferences can and cannot legitimately be drawn from such an observation. There is a simple and logical solution to these questions that deals with many of the difficulties courts have perceived with expert evidence. 

In later chapters we discuss how the expert should report such inferences and how the Court should interpret them, what weight the Court should give them and how they should be combined with other evidence to help the Court to decide the issues before it.  In the current chapter we consider how to evaluate a single item of evidence that is offered as proof of part of a party's case.

2.1 Relevance and Probative Value.

The first requirement that any piece of evidence tendered in court must satisfy is that it be relevant.  In order to be considered, an item of evidence must be one that might rationally affect the decision.  If it cannot, then it is surely worthless.  A typical definition of relevance which reflects that used in all common law systems is found in Rule 401 US Federal Rules of Evidence:

"Relevant evidence" means evidence having any tendency to make the existence of any fact that is of consequence to the determination of the action more probable or less probable than it would be without the evidence.

Rather than the term "fact" in this book we will use the words "assertion" or "hypothesis" for what must be proved in either a civil or criminal case.  If an item of evidence does not cause us to change our probability assessment for the hypothesis then we would not normally describe it as evidence either for or against it. Thus if an item of evidence is worth considering it is one that might cause us to increase or decrease our assessment of the probability of some fact which is of significance in determining the case.[1]   "Good evidence", presumably, must have a substantial effect on our probability.  What is it about a piece of evidence which enables us to change our probability assessment?  To answer this question, consider some extreme cases.

Ideal and useless evidence

An ideal piece of evidence would be something that always occurs when what we are trying to prove is true and never occurs otherwise.  If we are trying to demonstrate the truth of an hypothesis or assertion we would like to find as evidence something which always occurs when the hypothesis is true and never occurs when the hypothesis is not true. An example is the old proverb "where there is smoke there is fire". In real life, evidence this good is almost impossible to find. Suppose a blind person needed to determine if it were cloudy. Rain is not ideal evidence because absence of rain does not imply absence of cloud. If it is raining we can be sure there are clouds about but there may also be clouds if it is not raining.

At the other end of the scale, some information is certainly useless as evidence.  Imagine a child being interviewed because of suspected sexual abuse.  We seek factors which indicate abuse (or otherwise).  If we looked at 'all data' without discrimination we might note that she is breathing at the time of the interview.  After many such interviews we conclude that all children who allege abuse are breathing at the time of the interview.  But we know that this is useless as evidence of abuse simply because all other children breathe as well.  In other words the child is equally likely to be breathing whether it has been abused or not.  Despite being a characteristic shared by all abused children, breathing is not any sort of evidence for abuse.  It does not discriminate between abuse and non-abuse.

Likewise a large proportion of the DNA in our cells is indistinguishable in all human beings.  This is why we nearly all have two eyes, two legs etc.  The presence of this material in DNA samples taken from the scene of a crime and from a suspect is useless as evidence of identification.  Since everyone shares such characteristics the finding is equally likely whether or not it was the accused who left the mark.[2]

Typical Evidence

In practice though, ideal evidence is seldom found.  Even if the evidence always occurs when the hypothesis is true it may also occur when it is not (for example, rain as evidence for clouds).  Alternatively, when the hypothesis is true the event may not invariably occur. Thus, in the real world, evidence is something that is more likely to occur when what we are trying to prove is true than when it is not.  Good or strong evidence would be something that was much more likely to occur when what we are trying to prove is true than when it is not.

For example, during a career of interviewing, a doctor might observe that a high proportion of abused children display signs of stress such as nail-biting.  This will be evidence of abuse if and only if abused children are more likely to bite their nails than other children.  If it turned out that abused and non-abused children are equally likely to bite their nails then this observation is useless as evidence of abuse. If abused children are much more likely to bite their nails than non-abused then we have strong evidence of abuse. Suppose 80% of abused children bite their nails but only 10% of other children do so.  Nail-biting would then be 8 times more likely in the case of an abused child than in another child.  If, on the other hand, 90% of other children bite their nails then nail-biting would point the other way and reduce the probability that the child had been abused (but only weakly).

There are two points to notice.  First, that the strength (or probative value) of the evidence depends not only on how many abused children bite their nails but also on how many non-abused children do so. 

Second and most importantly, all we know at this stage is the probability of the evidence in each case. We do not know how likely it is that the child has been abused.

The probative value of any piece of evidence can be assessed in the same way.  A scientific test result is good evidence for a particular hypothesis if it is much more likely to occur if the hypothesis is true than when it is not.  We will know this only if we have seen the result of the test not only a number of occasions when the hypothesis is true but also when it is false  We cannot be sure of the strength of the evidence if we only observe cases when the hypothesis is true. But again, what we will know at the end of this is how likely the test result is if the hypothesis is true and not how likely it is that the hypothesis is true.

A breath-testing machine

Consider a simple breath testing machine designed to be used at the roadside for checking whether a driver is over or under the legal alcohol limit.  It is supposed to show a red light if the driver is over the limit, a green light if he is under. It will have to be calibrated before being used to determine how accurate it is.  We must guard against two types of error: a false positive and a false negative.  A false positive Ñ a red light shows when the person is actually under the limit Ñ leads to someone being wrongly arrested and inconvenienced at least by being required to be tested by the more accurate device at the police station.  A false negative Ñ a green light shows when the person is really over the limit Ñ leads to a drunk driver remaining on the road.

Unfortunately, reducing one of these rates by adjusting the settings of the device will inevitably increase the other.  There is presumably  a reason for each false reading and technical improvement would reduce the errors.  But bearing in mind that we are trying to produce a cheap and robust device we may not be able to afford to investigate all the causes. It may be impossible in practice to eliminate errors altogether and we simply have a choice of which errors to make.  Which error is the more serious is a political question but let us suppose some figures purely for sake of example.

Before using it, we calibrate the testing machine with samples of air with a measured alcohol content.  Many such samples are tested.  Suppose, as a numerical example, we test 1,000 samples with an alcohol concentration marginally below the legal limit and 1,000 samples marginally above.  We adjust the device so that of the samples over the limit, 950 read red and 50 read green, and of the samples below the limit, 995 read green and 5 read red.[3]

An aside on probability and odds

This section breaks the flow of our argument but the simple ideas of probability and odds form such an fundamental part of the  argument presented in this book that it is important that the reader is reminded of them. This section is therefore a quick revision of the ideas of probability and odds before we go further.  Readers already familiar with the concepts should therefore omit this section.

Probability is a rational measure of the degree of belief in the truth of an assertion based on information. The hypothesis, assertion, or premise is itself either true or false. For example, the assertion "The driver is over the drinking limit" is either true or false but we may not be sure of whether it is true or not. Our degree of belief about the truth of the assertion is measured by our assessment of its probability.

The value of all probabilities depend on the assumptions and information used in assessing them. Thus, we would assess a different probability for the assertion "the driver is over the drinking limit" if we had evidence of the result of the breath test than without that evidence. All the evidence that is used to assess a probability is known as the condition for the probability. All probabilities are conditional on the evidence used.

Evidence is also described in the form of assertions. Thus "the light showed red", if true, is evidence for the hypothesis that "the person is over the limit" and would determine a different probability than would be assessed if the evidence "the light showed green" were true. That, again, would generally be different if we either had no evidence of the colour of the light or had additional evidence of erratic driving.

Probabilities take values between 0 and 1.[4] A probability of 0 means that (on the basis of the evidence listed in the condition) the assertion is impossible and we therefore believe it to be false.

A probability of 1 means that, given the condition, the assertion is true. Thus my probability for the assertion "the sun will rise tomorrow" given my knowledge of the working of the solar system is 1.

Most assessments of probability fall between these limits.[5] A probability of 0.5 for an assertion means that we are equally sure (or equally unsure) that the assertion is true and that its negation is true.

Probabilities can be represented as percentages by multiplying them by 100%.  A probability of 0.5 could be described as a probability of 50%, one of 0.3 as a probability of 30%. We will often use percentages in this book.

We are also going to need to describe probabilities in the form of Odds. Many people are familiar with odds, if only from betting. They also recognise that they are a description of uncertainty, like probability. But not everyone realises that they are only another way of representing probability and one can go from one form to another quite easily.

To get the odds from the probability of an assertion you calculate the ratio of the probability to (1- the probability) and simplify as much as possible. Thus a probability of 0.3 has equivalent odds of

This could also be written as odds of 3 to 7 (in favour of the assertion). 

Odds corresponding to a probability of 0.5 are

These odds would be described as 1 to 1 or evens.

Odds of less than evens are sometimes reversed and described as "odds against" the assertion. Odds of 3 to 7 in favour of an assertion might, instead, be described as odds of 7 to 3 against.[6]

To return from odds to probability is a little trickier. One calculates the fraction: numerator of the odds divided by the sum of the numerator and the denominator of the odds.  Thus odds of 3 to 7 would be the same as a probability of

Even odds (1 to 1) correspond to a probability of 1/(1+1)=1/2=0.5.

The Breath tester again

From the data from the calibration tests we can see that

(1) If the sample is over the limit there is a 95% probability (950/1000) the device will indicate red and a 5% probability (50/1000) it will indicate greenÑthe odds are 19 to 1 that it will indicate red if the sample is over the limit.[7]

(2) If the sample is under the limit there is a 0.5% probability the device will indicate red and a 99.5% probability it will indicate greenÑ the odds are 199 to 1 that it will indicate green if the sample is under the limit.[8]

We can see from the calibration data that a red light on the breath test is pretty good evidence for the assertion "the person is over the limit".  If a person is over the limit there is a 95% probability of a red light; if a person is under the limit there is only a 0.5% probability of a red light.  Thus a red light is 190 times more likely to occur if the subject is over the limit than if under.  (95 / 0.5 = 190).

In contrast, a green light is good evidence against the assertion "the person is over the limit". If a person is over the limit there is a 5% probability of a green light; if under the limit there is a 99.5% probability of a green light. A green light is about 19.9 times less likely to occur if the person is over the limit than if under (5/99.5 =1/19.9). So, depending on the light shown the tester can provide  good evidence either for or against the assertion "the person is over the limit".

But again,  let us re-emphasise, this is only telling us the probative value of the evidence and not the probability that the person is over (or under) the limit.  We are sure that the breath tester is a good piece of evidence which means that it should cause us to change our assessment of the probability that the person is over the limit.  But how exactly is this to be done?

2.2 The Likelihood Ratio and Bayes' Rule

The information at this stage is the 'wrong way round'.  We knew the contents of the samples and we have determined the probability of getting a red signal given that the sample is over the limit.  But when the device is used we want to know something quite different.  If the device gives a red light what is the probability that the person is over the limit.[9]

Early in the history of probability theory attention was devoted to the difficulty that the evidence is the 'wrong way round'. This was known as the problem of inverse probabilities.[10]  The solution of a particular case was discovered by the Rev. Thomas Bayes (1702-1761) and published posthumously in 1763. His work was extended by Pierre Simon, Marquis de Laplace (1749-1827).  They proved, in what is now known as Bayes' Rule, that the value of a piece of evidence in testing a particular assertion against an alternative is determined by its Likelihood Ratio.

The Likelihood Ratio

We have already met the Likelihood Ratio in considering the evidence of nail-biting and the evidence of the breath tester in Section 2.1, above.  In the child-abuse case the Likelihood Ratio for nail-biting is the 80% chance of nail biting if the child has been abused divided by the 10% chance of nail biting if the child has not been abused, 80/10  or 8.  In the breath test example above the Likelihood Ratio for a red light is:

where the probabilities are of getting a red light supposing the two conditions (over and not over the limit).[11]

The Likelihood Ratio, then, is the probability of the evidence supposing our assertion is true divided by the probability of the evidence if the assertion is not true.  The probability of the evidence supposing the assertion is true, is the numerator.  The probability of the evidence if the assertion were not true, is the denominator.  When we divide them we get a single figure, a ratio which tells us the strength of the evidence in supporting our hypothesis.

If the Likelihood Ratio is more than 1 the evidence tells in favour of the hypothesis.  If the Ratio is less than 1 (usually expressed as a decimal fraction) then it tells against the hypothesis.  If the ratio is exactly 1 then the evidence is neutral.  The strength of the evidence is measured by how much the Likelihood Ratio differs from 1, in either direction.

Bayes' Rule

Bayes' Rule is a logical theorem Ñ there can be no doubt about its truth. It tells us how to update our knowledge by incorporating new evidence.  We start with some knowledge about the hypothesis, expressed as odds in favour of it. These are known as the prior odds. The prior odds (our assessment without the evidence) must be multiplied by the Likelihood Ratio of the new piece of evidence to give the posterior odds[12]  The posterior odds are what we want to know Ñ the odds in favour of the hypothesis after taking into account the new piece of evidence.

;

Returning to the breath tester, a red light makes the odds that the person was over the limit 190 times greater than we would have assessed them to be without it. 

So we must first consider how likely the person was over the limit before we consider the evidence of breath test.  In other words, what were the prior odds that the person was over the limit?  If the driver was stopped for no particular reason (for so-called 'random testing') these odds may just reflect the proportion of drivers at that time of day who are over the limit.  There might be only 1 out of 100 drivers who are over the limit. The prior odds would then be 1 to 99. Usually, though, there will be evidence such as erratic driving which alerted the police and caused the the driver to be stopped.  If, on the basis of such evidence it is believed that there is a 50/50 chance (odds of 1 to 1, or evens) that the driver is over the limit before testing, the odds in favour of that proposition after seeing the red light become 190 to 1.[13]

The effect of  Prior Odds

With a different prior probability, one will get a different posterior assessment.  Suppose that you knew that you certainly had not had a drink for over 48 hours but the device gives a red light when you blew into it.  You would probably conclude that the device had given a false reading although there is a slight possibility that alcohol had been retained in your body for a long time.  So, for example, you might have assessed the odds that you were truly over the limit as 10,000 to 1 against before taking the breath test Ñ that is odds of 1 to 10,000 in favour of the proposition you were over the limit.  After taking a test with a Likelihood Ratio of 190, as we just calculated, you should now consider that the odds that you are over the limit are about (1/10,000)(190) = 19/1000.  This is about 53 to 1 against being over the limit.  It is still very improbable but not nearly so improbable as before.

The officer performing the test, not knowing your history, may have different prior odds. Even if he believes that there is only a one in ten chance that you were over the limit prior to administering the test (i.e. odds of 1 to 10 that you were over the limit) he would now believe that the odds were (1/10)(190) = 19 to 1 in favour of that assertion.[14]

An HIV Test

Very large or very small prior odds can give some startling effects.  For example in testing for HIV among potential blood donors, it is very important to avoid false negatives.  That is to say someone with HIV should not be given a negative result (told they don't have HIV) by mistake.  In tests used in 1985, in order to avoid these false negatives a false positive rate of 1% had to be accepted.[15]  This meant that on average out of every 100 tests administered to those who are virus-free, 1 test wrongly said that HIV was present.  Assume that the test gave no false negatives.  A positive result was certain to occur if the subject carried HIV and had a 1% chance of occurring if the subject did not.  In other words a positive result had a Likelihood Ratio of 1/0.01 = 100. 

This sounds (and is) very accurate but a curious result is obtained when we combine the evidence with the prior odds.  At that time, probably fewer than 1 in 10,000 potential blood donors had the virus so the prior odds that the person had HIV were 1 to 10,000.  The Likelihood Ratio of the positive result is 100. Multiplying the prior odds in favour of infection by the Likelihood Ratio gives the posterior odds of 100 to 10,000 = 1 to 100 (i.e. 100 to 1 against).  In other words even given a positive test it was still highly unlikely that the person had the virus.

To see this more easily, imagine 10,001 tests, of 10,000 subjects without the virus and 1 infected.  We expect to record the 1 real infection and to obtain no false negatives.  But we would also expect to record 100 false positives from those without the infection, (with the remaining 9900 as true negatives).[16]  Thus we would expect 101 positive results only one of whom actually carries HIV.  So the posterior odds, of carrying HIV after considering a positive result, are 1 to 100 (or 100 to 1 against the subject being infected Ñ as demonstrated above).  This is why a second, independent test must be administered when a positive result occurs.

Using imaginary figures, if the second test result had a Likelihood Ratio of 1500, the posterior odds after both tests would be (1/10000)(100)(1500) = 15 to 1 in favour of the assertion of the presence of HIV. The problems of combining evidence in this way is discussed in more detail in Chapter 5.

Transposing the Conditional

We now mention one of the most common mistakes that are made in dealing with evidence. It is considered in more detail later in Chapter 6.  It is too easy Ñ and quite wrong Ñ to slip from the knowledge that there is only a 1% chance of a positive result if you do not carry the HIV virus to the conclusion that there is only a 1% chance that you do not carry HIV if you get a positive test result.  This is to ignore the prior odds Ñ in other words to ignore what we already know about the matter.  Likewise the fact that there was a 95% chance of a red light if the subject is over the limit does not mean that there is a 95% chance that the subject is over the limit if there is a red light.[17]  This common error is known as Transposing the Conditional[18] since the condition (the assertion that the subject is carrying the  HIV virus) is swapped with the evidence (of the test result).

The error can be recognised clearly in a case where nobody would make a mistake.  Consider the probability that a person, known to be over 6 feet tall, is a man.  This is obviously highÑmost 6-footers are men.  Now, in contrast, consider the probability that a person known to be a man, is over 6 feet tall.  This is obviously not the same since only a small proportion of men are over 6 feet tall. In the one case we are considering the probability that "the person is over 6 feet" and in the other the probability that "the person is a man".  The two cases also have different conditions. The confusion arises because the assertion in one statement is the evidence in the other (and vice versa). 

To look at this numerically, suppose that 5% of men but only 0.5% of women are at least 6 feet tall  Knowing that a perpetrator is 6 feet tall should multiply our assessment of the odds in favour of their being a man by the Likelihood ratio of 5/0.5 = 10.  Assuming that the number of men and women in the population are roughly equal, the prior odds that a person is a man is 1 to 1.  If we are told that the perpetrator is over six foot then the posterior odds that they are a man are (1/1)(10) = 10  to 1.

Suppose we had originally been given the information that the 6-foot person, whose sex we are considering, was a nurse.  We know from the information above that the fact that someone is over 6 foot multiplies the odds that they are a man by 10.  But the fact that someone is a nurse give us prior odds that they are a man equal to the small proportion of nurses who are men.  If only 2% of nurses are men this gives prior odds of 1 to 49.  The evidence that the person is over 6 feet multiplies these odds by 10.  This gives posterior odds in favour of being a man of 10 to 49  (odds of about 5 to 1 against being a man).  It is important to realise that the value of the evidence of height has not itself changed but a 6-foot nurse is still much more likely to be a woman than a man because of the huge imbalance of the sexes in that profession.

Giving evidence

If a case involving the breath test device were contested in court a forensic scientist might give evidence about the result of the test.  What evidence could he give?  He could not tell us the probability that person was over the limit since to do so he would have to hear and consider all the other evidence to assess the prior probability.  And that is really the job of the court.[19] What he could tell us, in this simplified example,  is that a positive test should multiply the prior odds that the person was over the limit by 190. That is, he should state the Likelihood Ratio and that is all he should say.

It follows that we cannot determine the probability of guilt (or presence at the scene, or paternity, or whatever else is to be proved) simply on the basis of the expert evidence.  We must have the prior odds as well.  But the task of determining the prior odds is a task for the judge or jury and not for the expert, who is not privy to the rest of the evidence in the case.

So, expert evidence should be restricted to the Likelihood Ratio given by the test or observation (or its components.)  If an expert purports to give a probability for the hypothesis he must be assuming some prior.  This is wrong in both law and logic.

2.3 Admissibility and Relevance

As we have seen, an item of evidence will change the odds if it has a Likelihood Ratio different from 1. If the Likelihood Ratio is greater than 1 then the evidence will cause our assessment of probability for the assertion to increase.  If it is less than 1 then our assessment of the probability should decrease.  Hence any piece of evidence giving a Likelihood Ratio other than 1 is relevant and, in principle, all relevant information should be used in coming to a rational assessment of the probability.[20]

To assess a Likelihood Ratio it is not essential to have precise numbers for each of the probabilities. The value of the evidence depends upon the ratio of these numbers. So, if we believe that the evidence is 10 times more probable under one hypothesis than the other, the Likelihood Ratio is 10, whatever the precise values of the numerator and denominator may be. Often we will be able to assess this ratio roughly on the basis of our general knowledge and experience. Saying that evidence is relevant is just another way of saying that it is more probable under one hypothesis than another and therefore has a Likelihood Ratio different from 1.

Unfortunately courts and commentators have often used the word "relevant" to mean something more complicated.  There is always a cost in terms of money, time, multiplication of issues, or possible prejudice, of introducing any piece of evidence.  The probative value of the evidence must be weighed against these costs.  US Federal Rule 403 provides:

Although relevant, evidence may be excluded if its probative value is substantially outweighed by the danger of unfair prejudice, confusion of the issues, or misleading the jury, or by considerations of undue delay, waste of time or needless presentation of cumulative evidence.

"Probative value" is clearly directly related to the Likelihood Ratio.  The further the Likelihood Ratio is from 1 (in either direction) the greater is the evidence's probative value.

Evidence with a Likelihood Ratio not far from 1 (say only 0.8 or 4), will have low probative value and might not be worth admitting if the cost (in the wider sense described) is too high. Some new types of evidence may well have such low values of Likelihood Ratio. It is up to the proponents of these new types of evidence to demonstrate their Likelihood Ratio on the basis of independent tests.

Some people distinguish between relevance and probative value while others refer to "degree of relevance".  What is not helpful is to use "relevant" to refer to the outcome of this balancing of probative value against the cost of admitting the evidence.  These two considerations must be kept separate, as they are by the US Federal Rules.

The problem for a judge is to determine the relevance or probative value of an individual item of evidence without examining the entire case. One of the objects of Rule 403 is to save time and expense and this will not be achieved if at an admissibility hearing the evidence is canvassed as fully as in open court.  Somehow, the judge  has to estimate what the probative value of the proposed evidence may be (that is, in our terms, what its Likelihood Ratio is) and balance that against the wider costs of admission.  If the mere question of admissibility will cause substantial argument and expense and one believes that the probative value of the evidence will be low then this itself may be a reason for refusing to admit it.[21] 

On the other hand, when examining forensic scientific evidence, there is a tendency to demand very high Likelihood Ratios. Sometimes DNA evidence, as we shall see in later chapters, can have Likelihood Ratios in the millions. Hodgkinson refers throughout for the need for the evidence to be of "high probative value".[22]   It seems that courts might regard the evidence as almost useless if the Likelihood Ratio is less than 100.  In the Australian case R v Tran[23] aspersions were cast on the DNA evidence because the Likelihood Ratio may have been as low as 87, but in other cases courts have recognised that DNA Likelihood Ratios as low as 72 and 40 are relevant evidence.[24]

Values as low as that may actually compare quite favourably with much evidence that is traditionally admitted such as eye-witness descriptions and identifications, although it may be difficult to obtain data to establish their Likelihood Ratios.  There seems no special reason why forensic scientific evidence should be subject to any more rigorous conditions.  Always assuming that the evidence does not fall  foul of some other exclusionary rule, a blood-grouping test giving a Likelihood Ratio of only 4 or 5 should not be rejected on that ground alone. The question is whether there is sufficient other evidence to combine with it to attain the required standard of proof. Combining evidence is discussed more fully in Chapter 5.

Prejudging the case?

To calculate the numerator of the Likelihood Ratio one has to assess the probability of the evidence supposing that the prosecution case were true.  This has led some to believe that calculating a Likelihood Ratio involves assuming that the prosecution case is true.[25]  This is misconceived.  One is only considering how probable the evidence would be supposing (for the sake of argument) the prosecution case were true and then comparing that with the probability of the evidence supposing that the defence case were true.  This process requires no level of belief in either of these hypotheses. Furthermore it merely makes explicit the logical reasoning process naturally applied to any piece of evidence.  If a juror thinks that a particular piece of evidence is incriminating this can only be because the juror thinks that the evidence is more probable if the prosecution case is true than if the defence case is.  If we were to make this objection to all evidence any sort of rational inference would become impossible.

2.4 Case Studies

At the end of appropriate chapters we shall discuss some real cases which illuminate points made in the body of the chapter. Here we look at a case where a Likelihood Ratio was given in court, some problems with paternity cases, and psychological evidence in child sex-abuse cases.

A case involving DNA evidence

The New Zealand case R v Pengelly[26] provides an example which helps us to see what should be done with the evidence. The case concerned a murder in Auckland, in the course of which the assailant cut himself and left bloodstains at the scene. These were analysed using a DNA profile (see Chapter 9). In court, the forensic scientist, Dr Margaret Lawton, described her results by saying:

In the analysis of the results I carried out I considered two alternatives: either that the blood samples originated from Pengelly or that the ... blood was from another individual. I find that the results I obtained were at least 12,450 times more likely to have occurred if the blood had originated from Pengelly than if it had originated from someone else.[27]

 Question: Can you express that in another way?

Answer: It could also be said that 1 in 12,450 people would have the same profile ... and that Pengelly was included in that number.

Although she did not use the term, the witness had stated the Likelihood Ratio for the evidence on the two hypotheses that the blood came from Pengelly and that it instead came from a randomly selected person.  This Likelihood Ratio had then to be multiplied by the prior odds. 

There are two ways to do this.  One is to consider the DNA evidence as the first item of evidence and determine the prior odds by asking what is the population from which the perpetrator could have come?  As the population of Auckland is approximately one million we would assign prior odds (that is prior to any evidence) of about 1 to 1,000,000 that Pengelly was the killer.[28] When we multiply those (conservative) odds by the Likelihood Ratio of 12450 we get for the posterior odds

These are odds of 80 to 1 that Pengelly is guilty (or 80 to 1 against  his guilt).  In other words, instead of being 1 out of a million people who might have committed the murder, Pengelly was 1 out of only about 80 who could have been guilty.  The effect of the evidence is to change the odds that Pengelly is guilty from 1 to 1,000,000 down to 1  to 80.  Further evidence was therefore needed before Pengelly could be convicted.

Alternatively, one could consider the other evidence first and come to a judgement of prior odds based upon that.  The other evidence in the case pointed to quite a small group, including Pengelly, which probably contained the perpetrator such that the prior odds were down to about 1 in 4.  When these prior odds are multiplied by the Likelihood Ratio of 12,450 we get for the posterior odds

Thus the posterior odds are over 3000 to 1 in favour of the assertion that Pengelly is guilty. This is equivalent to a probability of over 99.9%[29].

For reasons we discuss later Dr Lawton did not attempt to give the jury direct guidance on how to handle the Likelihood Ratio.  However the important point to note is that, correctly and consistently with the argument in this book, at no stage did she express an opinion as to the probability that the blood came from Pengelly.  She summed up her evidence by saying that the Likelihood Ratio of 12,450 "very strongly supports the premise that the two blood stains examined É came from Pengelly." 

The Probability of Paternity

Experts commonly testify to the probability of assertions  This is particularly so in paternity cases where courts are used to hearing witnesses give a probability that X is the father of Y.

Thus in the New Zealand case Byers v. Nicholls [30] we find testimony that "evidence of the [tests] indicates that there is a 99% probability for [Byers] being the father of [the girl]."  This typical statement follows a formula advocated as long ago as 1938.[31]  It is still in common use in several jurisdictions despite having been exposed as fallacious by eminent US scholars.[32]

Experts who adopt this method of giving evidence commit three major errors.  The first error is that they have assumed prior odds which have no connection with the facts of the particular case.  Before we can state a probability or odds for any assertion, we must assess prior odds.  These odds will depend upon the other evidence in the case.  But experts in paternity cases have developed the habit of routinely assuming  prior odds of 1 to 1 (evens) on the grounds that they know nothing about the case.  Now, with no information about a matter and only two hypotheses such odds may indeed be the appropriate assumption.[33]  But the experts certainly know something and the court probably knows more.  They know that this is a case in which one person has fathered a child and that that person's identity in doubt.  Why take prior odds of 1 to 1? Why not take prior odds of 1 to the male population of the world (say, 1 to 2 billion) or 1 to the male population of the country?  Unless the Likelihood Ratio yielded by the evidence is astronomically high (which with modern DNA testing it might well be) the assessment of the prior will substantially affect the eventual probability.

Secondly the witness is assuming an alternative hypothesis, that if the father were not the defendant it was some randomly selected male from the population, again without reference to the facts of the individual case. This choice of alternative hypothesis is discussed in Chapter 3. 

Thirdly, and worst of all, the witness may conceal these assumptions by wrapping them up in a single probability.  We shall also see in Chapter 5 that it is quite impossible to take evidence given in this form and combine it with the other evidence in the case. 

Sometimes the expert openly states what assumed prior is and says (for example) "assuming a prior of evens, and considering the odds against a match by chance, on the basis of this evidence alone, I believe the odds in favour of X being the father of Y are 10,000 to 1 Ñ that is a probability of 99.99%"[34]  As it stands this is not incorrect but it introduces extraneous material and confuses the issues.

In some jurisdictions in criminal cases witnesses state in a similar way the probability that the accused left a mark found at the scene of the crime.  Such evidence is sometimes called the 'probability of contact' and is common in the USA.  In a German case the witness gave a 'probability of contact' but made clear that it was based on a prior of evens (in other words that without the evidence the accused was as likely as not to be the person who left the mark).  The court said [in translation]

The conversion of the probability of the characteristics of 0.014% into a probability of incrimination É  of 99.986% requires, as the expert witness Dr H has set out to the Supreme Court, the establishment of a prior probability.  One can only reach a result of 99.986% if the prior probability of 50% is assumed.  That means É that before the DNA analysis the probability that the seminal fluid is from the accused is as high as the probability that it is not.  The expert witness, who should only report about the result of this DNA analysis could start from this (neutral) prior probability.  The Court had to be aware that the result of the expert witness's opinion only makes an abstract statement about the statistical probability of incrimination.  This result is not allowed to be treated as the equivalent of the concrete incrimination of the accused.[35]

The court was left with no guidance about how to use this 'abstract probability' as evidence.  If it had been, it would have been to combine the Likelihood Ratio for the evidence with the prior the court (not the expert) had assessed on the basis of the other evidence.  This would have made the expert's prior redundant.  The expert should simply have stated the Likelihood Ratio for the evidence.

Whenever expert witnesses purport to assess the probability of an hypothesis they should be questioned to establish the assumptions which have been built into their prior odds and to establish the true value of their evidence in the context of the particular case.  The only exceptions appear to be evidence like fingerprints and handwriting where experts assert that two impressions did or did not come from the same source.  The reasons for these exceptions are discussed later.

Although the courts have become accustomed to receiving such evidence (especially the 'probability of paternity') they have such difficulty in dealing with it that it is in precisely these cases that we find the courts agonising over the relationship between "statistical" and "legal" probability.  Thus in the English case Re J.S. (a minor)  Ormrod LJ said:

 The concept of 'probability' in the legal sense is certainly different from the mathematical concept; indeed, it is rare to find a situation in which these two usages co-exist, although when they do, the mathematical probability has to be taken into the assessment of probability in the legal sense and given its appropriate weight.[36]  

We believe this distinction is artificial. It is the giving of evidence of a probability of contact or of paternity which leads to the belief that there is such a thing as a 'mathematical probability' or an 'abstract statistical probability' which plays no part in common sense reasoning.  The solution is that experts should not  give evidence in this fashion. 

Child Sexual Abuse

The logic explained in this chapter can also help to untangle cases where evidence is not given in the form of numbers.  The Likelihood Ratios we have seen above happen to have been derived from statistical surveys or series of scientific measurements but our aim is to make the best possible use of all the information we have, in order to decide a particular case, including evidence which is not statistical in form.  The concept of a Likelihood Ratio for the evidence, even if we cannot state a precise numerical figure in every case, provides the appropriate logical tool for doing this.

In the New Zealand case R v B[37] a man was accused of sexually assaulting his adopted daughter.  A psychologist gave evidence of a number of tests and observations she had carried out while interviewing the girl.  Some of these were formalised tests such as the Family Relations Test and the Rother Incomplete Sentences Test.  Others were simply observations of the matters the child talked about, for example her dreams and her self-image.  In discussing each observation the psychologist made some comment such as:

"[this] is typical of sexually abused girls/children/young persons"

save for the dreams about which she said

"dreams of this kind are frequently experienced by sexually abused young people."

The Court regarded the psychologist's evidence as inadmissible for a number of reasons one of which was:

"its admission must inevitably lead to the jury learning the expert's opinion on the very issue it is required to answer É large parts of the É evidence clearly reflect the psychologist's view that she was examining a child who had been sexually abused by her father"[38]

In another New Zealand case just two years later, R v S[39] a psychologist gave evidence of a number of characteristics presented by a child alleging sexual abuse such as self-mutilation, lack of eye contact and unwillingness to talk about home life.  Before detailing these she had been asked the question:

"Did [the complainant] exhibit any characteristics which were consistent with what you had come to know as the characteristics of sexually abused children?"

to which she replied

"very definitely".

In the first case the expert is saying that she has examined a number of abused children and a high proportion of them exhibit these signs.  In other words she was giving the probability of finding the behaviour supposing that the child has been abused.[40]  Most explicitly "dreams of this kind are frequently experienced by sexually abused young people".  In R v S the expert was asked whether the child exhibited characteristics she had come to know as the characteristics of sexually abused children.  Again this clearly means characteristics frequently met in abused children. The probability of these characteristics is high supposing the child has been abused.

Of course, what the jury had to decide in each case was the probability that the child had been abused, given the psychologist's evidence and all the other evidence in the case.[41]  We can now see that the psychologist has not provided all the information the jury needs.  First there must be prior odds, though that is not the sole responsibility of the expert.  The prior odds may be provided by other evidence or might simply be the result of a survey as to the occurrence of the relevant type of abuse. Then the court also need to know how probable the evidence is if the child had not been abused.[42]  The Court of Appeal rejected the evidence for a number of reasons which missed the real issues but it approached this point when it pointed out "some at least of those characteristics É may very well occur in children who have problems other than sexual abuse".  In other words there may have been alternative explanations for the evidence.  Alternative explanations will be discussed in the next chapter.

2.5 Summary

¥ A  forensic  scientist cannot tell us how probable the prosecution case is but only how much more probable the evidence is if the prosecution case is true than if it is not.

¥ The figure which expresses this comparison is the Likelihood Ratio.

¥ In principle, evidence will  be relevant when the Likelihood Ratio is less than or greater than 1.  A Likelihood Ratio of 1 means that  the evidence is unhelpful and irrelevant.

¥ Although relevant, evidence may be excluded by an exclusionary rule or because its probative value (measured by the Likelihood Ratio) is not sufficient to overcome the cost of admitting it in terms of time, money, confusion, or prejudice.

 



[1] Montrose uses the term 'material' rather than significant.  See Montrose,J.L., 'Basic Concepts in the Law of Evidence'  (1954) 79 L Q R 527.

[2]  Lempert, R., 'Some caveats concerning DNA as criminal identification evidence; with thanks to the Reverend Bayes'  (1991) 132 Cardozo L R 303-342.

[3]  We must make two points about this exposition. First the figures have been chosen simply for arithmetic simplicity and may well be wrong by orders of magnitude.  Secondly the problem has been simplified.  In fact the probability of a false reading will decline as one moves away from the limit so that the chances of a false positive from a sample substantially under the limit will be negligible.

[4]The reasons for this are explained in our paper Robertson, B.W.N. and Vignaux, G.A., 'Probability - the Logic of the Law'  (1993) 13 OJLS 457.

[5]We are usually warned even to be aware of the possibility of an unexpected astronomical calamity which would prevent the sun rising tomorrow and therefore never to assess the probability of a "certain" assertion exactly as 1 but as a value minutely less.

[6]Odds are also sometimes described just as a fraction or its decimal equivalent. Odds of 3 to 7 might be stated as 0.43=3/7; evens as 1.0=1/1.  This presentation always has the potential to be confused with a true fraction however and we shall not use it.

[7] It is much, much clearer and more precise to express this in symbols:  Probability(red | over the limit) = 0.95 where the symbol "|" stands for "given the condition" or just "given".  Similarly,  Probability(green | over) = 0.05.

[8] Prob(red | under the limit) = 0.01 and Prob(green | under) = 0.99.

[9]  What we want is Probability(over the limit | red light).

[10]  For some interestingly written history see Gigerenzer, G., et alThe Empire of Chance, CU P (1989).

[11] For a green light the Likelihood Ratio for these two assertions is (5)/(99.5) = 1/(19.9).

[12]  This is more formally presented in the Appendix where there is also a discussion of probability and odds.

[13]  Odds of 190 to 1 correspond to a probability of 190/(190+1)=0.9948.

[14]This paragraph emphasises that there is no such thing as a 'true probability'.  The truth is that you are either over the limit or under.  The reason we have to make probability assessments is that we do not have complete information and every probability assessment is dependent (or 'conditional') on the information taken into account.

[15] Notice the similarity with the breath-tester. The data are taken from letters to the editor from Hall G.H. (1985) 291 BMJ 1424 and Seetulsingh D.(1985) 291 BMJ 1647.

[16]  In this calculation we use expected values only.  In any particular group the numbers of false positives may well vary, for reasons we cannot control - otherwise we could reduce the false positive rate.

[17] This is equating Prob(red | over limit) to Prob( over limit | red).

[18] This phrase appears to have been coined in Diaconis, P. and Freedman, D., 'The persistence of cognitive illusions'  (1981) 4 The Behavioural and Brain Sciences 333.  Although it uses an adjective as a noun it is now the phrase in common use.  The fallacy is discussed more fully, with some examples from cases in Chapter 6. 

[19] We use the term 'court' informally to mean 'tribunal of fact' as opposed to the forensic scientist. In the small percentage of cases tried on indictment this will be the jury rather than the judge.

[20] "Éunless excluded by some rule or principle of law, all that is logically probative is admissible." Thayer, J.P., A preliminary treatise on the law of evidence,  (1898) p 264.

[21] For example, polygraph or lie-detector tests seem to produce likelihood ratios of only 1.5 to 3 and to be very weak evidence. Kleinmuntz, B. and Szucko, J.J., 'A field study of the fallibility of polygraphic lie detection' Nature, 308, 449-450 (1984).

[22] Hodgkinson T. Expert Evidence Law and Practice ( Sweet and Maxwell, London, 1990)  4.

[23] R v Tran [1990] 50 A Crim R 233.  This case also involved problems with confidence intervals which are dealt with in ch 6.

[24]  Police Department of Rarotonga v Amoa Amoa  Court of Appeal of the Cook  Islands CA 3/93 11 August 1993, tbr SPLR.

[25] A recent example is Uviller, H.R. (1994) 43 Duke LJ 834, 836 et seq.

[26]  [1992] 1 NZLR 545 (CA).  The material quoted is from the trial at first instance and taken from the transcript in the Case on Appeal.

[27]  It became clear in cross-examination that 'someone else' meant 'a randomly selected member of the population'.

[28]  We have not even yet taken into account that violent burglaries are carried out by able bodied people (usually male) over about 12 and under 60, which would cut the odds down further. 

[29] From the odds, 3112/(3113+1)=0.9997.

[30] (1988) 4 NZFLR 545.

[31]Essen-Mšller, E., 'Die Biesweiskraft der €hnlichkeit im Vater Schaftsnachweis; Theoretische Grundlagen' (1938) 68 Mitt Anthorop, Ges (Wein) 598.

[32]  For example see Kaye, D., 'The probability of an ultimate issue; the strange cases of paternity testing'  (1989) 1 Iowa L R 75-109.

[33] The search for methods of determining uninformative priors has been a constant theme in the Bayesian literature since the time of Laplace.  see, for example, Box and Tiao, Bayesian inference in statistical analysis (Reading, Mass: Addison-Wesley, 1973)  and Jaynes 'Where do we stand on Maximum Entropy?' in Rosenkranz (Ed), E T Jaynes: Papers on Probability, Statistics and Statistical Physics, (Dordrecht:Reidel, 1983).

[34] e.g. Loveridge v Adlam [1991] NZFLR 267, Brimicombe v Maughan [1992] NZFLR 476.

[35] BGH Urt v. 12.8.1992 10 MDR (Germany) 988.

[36] Re JS ( A Minor) [1981] Fam 22, 29; [1980] 1 All ER 1061.

[37]  [1987] 1 NZLR 362.

[38]  per Casey J at p 372.

[39]  [1989] 1 NZLR 714.

[40]  Prob(Behaviour|Abuse).

[41]  Prob(Abuse|Behaviour etc.).

[42]  Prob(Behaviour|No Abuse).