Causality & Randomness

‌Doctors, scientists, and companies alike are all concerned with determining causality. For example, suppose a doctor sees many patients who have become sick with a new strain of flu. She decides to study the virus and develop a vaccine to prevent more patients from getting the flu. Suppose she develops the vaccine and it is administered. No one who received the vaccine gets the new strain of flu. Can we conclude that her vaccine is effective in preventing the flu?‌

Correlation

It is a common mistake to assume that correlation, informally defined as a sort of association, implies causality. In the previous example patients who received a flu vaccine did not get the flu. In other words, there's a correlation between receiving the vaccine and getting the flu. If someone got the vaccine, they did not get the flu. We might first believe that the correlation means that the vaccine was effective. But what if everyone who got the vaccine also actively avoided people who had the flu and took care to wash their hands after coming in contact with someone who had the flu? We cannot assume that just because there is an association between two variables, there is an underlying cause-effect relationship between the two variables. Correlation does not imply causation.

If correlation implied causation we would might conclude that eating more mozzarella cheese causes someone to go and receive a doctorate in civil engineering. But this is definitely not the case. Correlation alone is not enough to establish causation.‌

Confusing correlation with causation can have serious consequences. In 1896, a statistician by the name of Frederick Hoffman published a report which concluded that African Americans were uninsurable due to reduced life expectancy. He failed to account for confounding factors such as poverty, lack of schools and modern plumbing, and other institutional factors that Black Americans faced (and still face today). In other words, he confused correlation with causation. However this report was accepted by insurance companies, which meant that Black Americans were systematically discriminated against in insurance policies. We will explore this more of this in later parts of the course.

Randomization ‌

In order to establish causation we can make use of randomness. First, it is helpful to define a few key terms:‌

Treatment group: Those who receive a treatment in a study. Note that "treatment" is not necessarily used in the medical sense. In the original example, the treatment group would be those who received the vaccine, but a treatment group does not have to receive any medical treatment. For example, if we want to study whether or not coffee causes lung cancer, the treatment group would be those who drink coffee.‌

Control group: Those who do not receive a treatment in a study. In the original study no control group was mentioned, but if we wanted to set up an experiment to test whether or not the doctor's vaccine was effective, the control group would be those who do not receive the vaccine. Similarly, in the coffee example, the control group would be those who do not drink coffee.‌

In a randomized control experiment, individuals are randomly assigned to the treatment and control groups. In this way, the groups are the same except for the treatment. If the doctor in the original example had randomly chosen enough patients to belong to each of her treatment and control groups, and the people in the treatment group did not get the flu, but the control group did, then the vaccine could be determined to be effective.‌

Example

A study followed 1045 people with cardiovascular disease, randomly selected from hospital patients. Three months later, those who owned a cat were six times more likely to be alive than those who didn’t.‌

a. True or false: This is a randomized control experiment.

b. True of false: This shows that for someone with cardiovascular disease, owning a cat causes someone to live longer.‌

a. False: The treatment group (having a cat) and control group (not having a cat) were not randomly selected. Those who previously owned a cat belonged to the treatment group. The selection was therefore not random.

b. False: Causality cannot be determined without a randomized control experiment.‌

Confounding Factors

A confounding factor is an underlying difference between the treatment and the control group that might mess you up in determining causality. For example, in the example about cardiovascular disease and cats, a confounding factor could be that cat owners are more likely to eat healthily.‌

By randomly assigning individuals to control and treatment groups, confounding factors are eliminated.‌

It may not always be possible to carry out a randomized control experiment. If someone wanted to study the effects of smoking during pregnancy, it is unlikely that people randomly chosen to be in the treatment group would be willing to smoke throughout their pregnancy. In cases like this, it is important to watch out for confounding factors.‌

Checkpoint

A scientist synthesizes a new drug that she believes treats dengue fever. She performs an experiment where she finds 1000 dengue patients. Half of then are randomly assigned to receive the drug and half of them take a placebo (a harmless drug that is intended to have no effect). Of those who take the drug, a much larger proportion recover than those who take the placebo. At the same time, mosquito repellent is distributed to stop mosquitoes (which spread dengue virus).‌

a. True or false: This is a randomized control experiment.

b. True or false: This experiment allows us to conclude that taking the drug causes a recovery from dengue.

c. True or false: Mosquito repellent is a confounding factor in this experiment.

Last updated