WHEN ARE OBSERVATIONAL STUDIES AS STRONG AS RCTS? |
Friedo Dekker, Leiden, Netherlands |
Chair:
Kitty Jager, Amsterdam, Netherlands |
Alison MacLeod, Aberdeen, UK
|
|
Dr. F.W. Dekker |
Slide 1
Kitty Jager presented an excellent overview of the strengths and weaknesses of all the observational study designs and this was her main message I guess.
Slide 2
With the level of evidence some people call it for intended effects of therapy with their RCT on top of it and for observational studies we have other purposes and for other purposes we have other levels of evidence. So, she has already said that RCT is the king of study designs in the left box but I want to think with you why is that exactly the case?
Slide 3
I want to do that because if we really understand why it is the king of study designs for effects of therapy, then we can think about alternatives and then we can think about observational studies, under what circumstances are they as credible providing as good evidence as randomised trials. So that is the outline of my presentation.
Slide 4
Now, first we have to think why do we treat a patient? I guess that it’s related to influencing prognosis because as doctors we have a gut feeling about what is good for a patient and then you decide on a certain therapy. So if a patient has a poor prognosis, you’ll want to give a therapy and together you hope to reach a good prognosis. So it’s a kind of perceived prognosis why we treat a patient. I’ll come back to that.
Slide 5
Now what is influencing prognosis? There are a number of factors that is, of course, the clinical status of a patient, that’s what we are used to thinking about but it’s also the environment in which a patient is. It’s also genetic makeup which determines the prognosis and also demographic factors are influencing it. But this is not that easy because in this context we want to study the effect of therapy. So the effect of therapy has to go in between all the other factors influencing prognosis. It’s even more complicated because our therapy prescribed will rarely be taken for 100% by our patients. It’s even more complicated because there are many, many other factors influencing prognosis as well and I omitted the arrows because it would be unreadable. So it’s a really complicated background in which we want to study the effects of therapy on prognosis.
Slide 6
Now, if we want to study the effect of therapy and we want to prove that therapy is good for a patient, can you ever prove it in this context of all those factors? That is difficult because they’re always will be other factors involved and for instance, the chance that you can always luckily find an effect or you can have bad luck and find no effect whilst it’s there, so it’s difficult really to prove. So that’s in practice but in theory what would be the ideal situation? We have a patient and we want to do two things in the same patient. So the first thing is do nothing but observe. So you have a patient with a perceived prognosis and over time we follow the patient to see what will be the outcome. Then we would like to rewind time and go back to the same patient again, then treat a patient and then again follow the patient and see what happens with the treatment. That would be the ideal. Exactly the same patient rewinding time and do it twice. In the literature this is called ‘counterfactual’ in that we have the same situation, the same patient, the same circumstances, same environment, same time frame everything exactly the same except for the therapy. Now that is theory because in practice it is impossible to rewind time and to do it in the same patient. So therefore we need something else and therefore, we make two groups. Then we have 2 groups with comparable prognosis and then we follow both groups and then we hope to see that the group we do not treat stays the same and the group that we do treat improves in prognosis and that is the way we use to prove, to give evidence that this treatment really works.
Slide 7
So what is the biggest threat to comparable prognosis for those two groups to make them really comparable in this context with all those factors influencing it?
Well, that’s the biggest threat, I guess that’s a kind of gut feeling of the doctor mainly and a little bit of the patient, about perceived therapy. The doctors and the patients have expectations about therapy. They don’t haphazard start a therapy, they have a very good idea about why you do it because there’s a kind of indication for the therapy and so that’s all about perception of prognosis. So we have an idea about indication therefore we want to treat a patient with this drug instead of just following it and we call that confounding by indication that is the real problem. It’s also in your data but it starts in your mind.
Now how to solve this confounding by indication if we study therapy? You have to do something else, you have to prevent that the doctor and the patient together think about what is good for this patient, what is the prognosis for this patient and based on the prognosis chose your therapy. You have to prevent that, don’t let them do that. So you need a kind of external mechanism which tells me what patient is going to get what therapy, that is the way to do it. You break the link as Kitty explained, between the perceived prognosis and the therapy you give, that is the reason why we randomise. So randomisation is not per se to make the groups comparable I’ll come back to that in a minute but to prevent the doctor and the patient from letting their perceived prognosis influence the choice of therapy because that is a fundamental incomparability.
Slide 8
Now how can we randomise? We all know that we can throw a dice or flip a coin or use envelopes and they are not very safe methods because if you flip a coin, you know when you play a game or when you throw a dice if you do a game with people on Christmas Eve, if you play monopoly or whatever there’s always this dice on the edge of the table and it’s not very clear what is on top, it’s flawed. Also the envelopes that’s better if you have sealed opaque envelopes but then still it’s very easy to open an envelope and say well, I don’t like this for the patient results, I don’t want to give this patient only treatment A I want to give treatment B. So let’s put this envelope here on the table and wait for the next patient because I’m sure this afternoon the next patient that will come in I can give that treatment. So again it’s easy to get a flawed system. So therefore if you use your computer that’s better. That’s unpredictable then for the doctor.
Slide 9
Now, if we randomise in one of these ways, do we really get comparable groups? This is the example Kitty mentioned of the 4D study. Now, do we get the same size of groups if we randomise? Is the number of patients per group the same? Well, it’s not because here it’s 636 and here it’s 619. Why would that be? It is random; it’s just a matter of chance. If you randomise the groups, they can be of an equal size. So if the total group can be of an equal size, then also the subgroup can be of an equal size. We see that for instance age is very comparable but for the smokers and especially the former smokers you see there are a lot more in the smaller group. So there is a 5% difference in former smokers and that is only because of chance.
Now if you see these parts and there are many more baseline characteristics of course, in this table but if you see this part, would you say that the prognosis is really identical? It’s not with respect to smoking. Now the advantage is in the placebo group I guess and for some other factors the advantage can be in the other group. On average you can hope that it weighs out overall but if it weighs exactly out in this study then I’m sure that in the next study we do prognosis overall we’ll be slightly better in one group and if I repeat the study, and I do another trial, prognosis will be slightly better in another group. So there will always be some differences and if we have 30.000, genes they will never be exactly equally distributed between the groups. So we hope to get equal prognosis but it’s not a guarantee. If it’s not a guarantee, I’m not sure that the aim of randomisation is to get equal prognosis. The real aim is to prevent the doctor from choosing the therapy because of the perceived prognosis because that would be serious confounding by indication.
Slide 10
If that is the aim not to get by definition guaranteed equal groups but to prevent a doctor from choosing it, then we can think about other schemes for allocation treatment. For instance, if I give alternating you get treatment A/B, A/B, A/B, would that be a correct system? It would because it has the same properties of randomising patients in that on average if I do in the whole room, then I will have more or less equal numbers not much difference compared to the 4D study.
So the numbers will be the same and the age and the sex will be more or less the same. Again you cannot choose your treatment because there’s a system choosing the treatment A/B, A/B, that’s my system. Why is it wrong? In that it is predictable. Because I know A/B, A/B, A/B, you know you get B, and if you don’t want to get B, you say I’m sorry I have to leave the room for a moment and then you get B and because that’s predictable therefore it’s not good.
But in fact in the early 40s, 50s of the previous century when a randomised trial started, it was not real randomisation it was alternation. That was ok as long as not predictable and not known.
Can I use instead of that I have a computer randomising it but the secret system in my computer is not really random it is rather the real date of birth is even or odd. That is a perfect system, as long as nobody knows. If I ask your postal zip code and I see whether it’s even or odd I can use that as a system as long as nobody knows. So you can have many external mechanisms to allocate treatment as long as it’s impossible for the doctor to let the prognosis influence what you want to give.
So, if it’s not predictable, the treatment allocation is concealed as we say, then it could be ok. Well, you can never make sure that nobody discovers my secret zip code alternation, so therefore we don’t do it but it is valid.
So the mechanism must not be related to prognosis. If you really understand this that it is not a guarantee for equal prognosis but it’s preventing the doctor from choosing it, then we have gone very far already.
Slide 11
Now, let’s think about an example. I have a study about an ACE-inhibitor and I want to test the effect on blood pressure in PD patients. So what would be the best study design for this? I guess this is just a randomised trial because it’s an intended effect which ACE inhibitors we hope to lower the blood pressure and therefore, that is the prognosis what we aim at therefore we have to randomise in a way. If I want to do the same study on another outcome on mortality of PD patients, what would then be the best study design? Well, again I guess because we hope with this ACE inhibitor there’s an intended effect that prognosis improves with respect to mortality as well. So again you definitely need an RCT, a real blinded randomised controlled trial.
Slide 12
Now a different one. I want to study the effect of ACE-inhibitors on peritoneal transport status in PD patients. Perhaps nowadays it’s a little bit known but a few years ago it was a real secret nobody knew that ACE-inhibitors are blocking vascular growth in the peritoneal membrane and therefore, as a kind a side effect of therapy perhaps it preserves the peritoneal membrane. So some years ago there was not a single doctor in Europe who prescribed ACE-inhibitors with the purpose of preserving the peritoneal membrane. If you accept that, then, of course, you can do an RCT but you can also do a retrospective observational study. Why is that? Because it’s an unintended effect there was not a prognosis in the mind of the doctor about transport status. So an unintended effect not known previously, so they couldn’t have any confounding by indication, no selection bias. The prescription of whether to give an ACE-inhibitor to a patient on PD was based on his blood pressure but certainly not on his membrane status or the risk to get membrane failure or the prognosis with respect to transport status. So in this respect for this outcome there is no confounding by indication.
Of course, there’s an assumption here that the prescription was indeed because of blood pressure and blood pressure itself should not be related to future transport status. You could think about that, you could argue about that but if you’re willing to accept this, then if you just give it because of blood pressure then it’s random with respect to prognosis of the membrane. Therefore, it is a valid design to do it in a retrospective observation follow up study.
Now many people think that prospective studies are always better than retrospective studies. Many of you will believe it. In this particular example will a prospective follow up study be as good as or better than a retrospective follow up study? Could we imagine that we agree today let’s study this in a prospective observational study? So we agree that you all start in your new PD patients. You prescribe ACE-inhibitors and we want to see whether the peritoneal membrane is preserved over time, will be preserved but do we agree that you prescribe it only because of blood pressure and not because of this? That’s not possible because you know my story and we know what we’re aiming at so in a prospective study it’s not valid anymore, we cannot do it. So a retrospective study is much better here. There are more examples of that.
Slide 13
So the results, we did this actually in a cohort study in NECOSAD in patients with PD as first treatment who had been on PD already for two years and in those two years had a standard CAPD regimen 8-10 L/day. 120 of those patients appear to have at least 25% of time ACE-inhibitors while 87% did not use these drugs at all and the results of that is at the beginning luckily there were indeed the same D/p ratio which was a bit of poor --- outcome measure for peritoneal transport and you see over time that the controls increase over time as we expect but in the patients who had ACE or ARBs it was much lower and the interaction time was inefficient.
Slide 14
So this example did work well and the conclusion of only this example is that the ACE-inhibitors seem to be protective on the peritoneal membrane in these patients and under strict conditions the retrospective observational study can be as good as the randomised controlled trial in this specific circumstance of unintended effects. Here the retrospective observational study is much more valid than a prospective observational study and certainly it’s much more efficient if you already have those data.
Slide 15
So this is about unintended effects of therapy. A positive side effect so to say, preservation of the membrane. Now we have also adverse events, a negative side effect of therapy. What about it? If we want to study the example Kitty used about the pure red blood cell aplasia and suppose we have the idea about that it’s related to subcutaneous versus intravenous Epo prescription, if I want to study that, can I use an RCT? Of course you can do it but it’s a lot of work and you will need many patients because it’s a very rare event.
Now to think about it was the choice of the route of administration in the previous years, was that related to the risk of pure red cell aplasia? It was not, it is now certainly but not before the first reports emerged. It was unknown, so in this respect it was random. It was unbiased. So in this case if you had done it some years ago or you do it in retrospective data, it is as valid as an RCT again. You don’t need a randomised controlled trial for this.
Slide 16
So the RCTs are usually too small to do studies on adverse events especially if they are a bit rare. Most of the time it’s not necessary to do an RCT. The retrospective studies are much more efficient for that. Remember that the first reports Kitty showed about the pure red cell aplasia was only a case report and it’s perfectly able to get a case report only certain cases published in the New England. Well, we all would love to have a case report in the New England. So this could be a very strong study design.
Slide 17
Now what about the adverse events of therapy? They can be studied validly in observational studies as I said but only under certain circumstances and these are that the risk for a specific patient to get this adverse event is unknown at prescription. The indication to prescribe is unrelated to the risk of the adverse event itself and the adverse event has other prognostic factors then related to the disease we are prescribing for it because then we would have indirect confounding by indication. This usually holds and sometimes there is a very clear subgroup for which it does not hold for these assumptions then you can easily restrict it to the subgroup of patients for which it does hold.
Slide 18
This was about intended effects and unintended side effects of therapy. Now something completely different.
Suppose I have a secret therapy, a kind of gene therapy and I am able to randomly allocate a certain part of a gene, a SNP to a patient, don’t tell anybody but I can. If I could do that, then I could do a randomised trial plugging in this SNP to some patients, random patients and others I don’t plug in this SNP. So then I could do an RCT, a randomised controlled trial with my SNP. So that would be nice because then I have a real randomised controlled trial and then it’s much stronger than just the observational studies, wouldn’t it? It’s much stronger than those association studies right? If I could do it like this. So if I had this kind of gene therapy, it would be a miracle. Another thing is how are our things allocated in nature? It is completely random, my parents did it for me. I cannot influence it. It is random whether I got it or my brother got this SNP. So in fact, it is already random. So thinking about this it is completely, they call it Mendelian randomisation, it is already random so we don’t need randomized trials we can use perfectly observational studies for genetic effects.
Now perhaps you are getting a little bit uncomfortable because I’m selling here to you the strengths of genetic association studies. I say it’s as valid as RCTs. But why don’t people believe it then? So we have to think about it.
Slide 19
Now, there’s a kind of axis of haphazardness of exposure, on top of that it is the RCT. It’s really, if you randomise it, it’s really completely haphazard which patient got which therapy. Perfectly haphazard. If we randomise the genes or it has been randomised by nature, then it’s almost completely haphazard. Well, perhaps not in certain populations with ethnic differences but then you just restrict to homogenous population.
If we think about the unknown adverse events of therapy, then it is quite haphazard you could argue that on a specific case. If I have an unintended positive effect like I talked about the ACE-inhibitors, then it could be very haphazard not in all the observational studies but there are these nice examples. If you have an intended effect of therapy, that is hardly haphazard. Is it completely non haphazard? There are always random effects so they’re always a little bit haphazard but too little so this is a kind of axis of evidence for this.
Slide 20
Now, why are we felling uncomfortable about my selling these genetic association studies being as strong as randomised trials? You’re correct in saying that perhaps it’s not the very best study. But why? It’s not because it’s not randomised there’s another issue related to it. Think about how many genes we have. Nobody knows. 30.000 more. How many SNPs do we have in that? Millions. How many if I do a whole genome scan with thousands of markers and I relate them to my favourite disease and control group, an association study. I can find many positive associations. How many could be functionally related to my disease? Still a lot, so if I find one positive association very significant, I can do that but what is the prior chance of finding this positive association at least one gene that is very, very low because perhaps it’s only 1: 100.000. So if I test 100.000 SNPs and I find a number of positive associations, then perhaps there’s only one which really 2 and will be replicated in other studies.
So even after 1 positive study with a significant effect, an odds ratio of 2 or whatever then still after that study there’s a quite low chance, it’s only twice as high that these effects will be two and will be replicated. Therefore, we tend not to believe a single association study not because it’s not randomised but because the prior chance was low and therefore the posterior chance is low.
Slide 21
So there’s also another axis of multiplicity you could call it for genetic association studies this prior is low and therefore, you are inclined not to believe the first single positive study about it and that’s very different compared to randomised trials because when do we do a randomised trial? Only when we are really in – as we say that we think it can be 50:50, the drug can work or it cannot work, I don’t really know. 50% chance. So then you do a controlled trial and so my prior is 50%. So if I have a prior 50% and then I have a positive result I can estimate what will be the chance that’s really 2 and will be replicated. You can do it with a 2x2 table and just in diagnostic test you have a test result positive negative that’s my study. I have a prior chance it’s 2 or not and I can calculate the predictive value just Bayesian reasoning. So after a single trial is positive and significant then the posterior chance is quite high and therefore, we are inclined to believe a single randomised trial and we are not inclined to believe a single genetic association study. But that has nothing to do with the fact that it is only observational or non random.
Slide 22
So as we understand the purpose of randomisation breaking this link, preventing the doctor from choosing the therapy because of perceived prognosis then we can say, we can think about the strengths of observational studies and if we do that, then sometimes you really have to think hard because it’s not according to the standard books, it’s not according to the easy list of level of evidence. It’s not so easy to say that all the guidelines should have all the elements, level A evidence. Because for side effects and other things that’s impossible and not necessary we don’t want it other studies can be much more valid. So it is difficult because you have to think about it. You did a good job this morning in really thinking about this. But there is some help and there’s help for instance in this report about STROBE, it’s not only how you report your study but also on how you can do your study to make a good paper out of it.
Slide 23
As a final acknowledgement these are the two papers that were of big inspiration to me about the credibility of observational studies and the two views on it. Thank you.