COMPACT PRIMER AND UPDATE FOR THE NEPHROLOGIST

CLINICAL EPIDEMIOLOGY

THE PREPARATION AND CONDUCT OF CLINICAL STUDIES

Kitty Jager, Amsterdam, the Netherlands
   

 

jager

Dr. K.J. Jager
ERA-EDTA Registry - Department of Medical Informatics,
Academic Medical Center, University of Amsterdam,
Amsterdam, The Netherlands


Slide 1

jagerslide

 

Slide 2

jagerslide

 

 

Slide 3

jagerslide

That means that in a study question you should describe the relationship between dependent and independent variables for example, what is the relationship between statins and cardiovascular mortality? You should describe in whom you want to do this study, for example, in haemodialysis patients. How you expect that the character and the direction of the expected relationship will be. For example, we think that the use of statins will reduce cardiovascular mortality in haemodialysis patients and also when we expect that this might be the case over a period of 4 years. Once you have done all this and put a study question together, then you have formulated the study question of the 4D study. But what you should also look at is to whether your study is feasible, very important of course, whether it’s interesting, whether it’s novel, it adds something new to the current knowledge, of course also whether it’s ethical and whether it’s relevant.

Slide 4

jagerslide

After having formulated your first study question, you should start background reading, read journal reviews on this specific topic but please do not forget that also registries or statistics offices may have information that you may use in planning your study. It may also turn out that the literature has already, at least partially, answered your study question. On the basis of what you’ve read you should further define the study question. But please be aware that the literature may also provide a biased view, that’s what we call publication bias.

Slide 5

jagerslide

What is publication bias? Publication bias is that mainly positive results of studies have been published on a specific subject but not the negative studies. An example, publication bias can be diagnosed by making funnel plots in a metanalysis and you can see here a funnel plot. What it shows here is this is the effect size of a particular research question which either gives a positive effect or a negative effect and the vertical axis represents also the size of the study. What you can see that the larger the size of the study, the less variation in the resulting effect sizes. What you can also see when this funnel plot is symmetrical this indicates that there is no publication bias because for the smaller studies there are as many studies giving positive results as there are studies giving negative results. However, if you do a funnel plot and find an asymmetrical funnel plot, this does suggest publication bias because in the smaller studies you mostly find the positive studies, whereas there is a lack of studies that provide a negative result. How come? Well, first of all I think also journal editors are biased because they would like to publish more frequently positive studies but also authors think that when they have a negative study giving a negative result or an equivalent result, it is not useful to submit their studies. But it is useful to submit those studies because when people after you will make a metanalysis, all those studies together can provide a good answer to the study question.

Slide 6

jagerslide

Then what about a study protocol? I will elaborate a bit about study designs, about how to choose a study population and about methodology.

Slide 7

jagerslide

First a study design depends on the study question first of all. Most of you will know the hierarchy of study designs that is published in the literature and the hierarchy study designs starts at the bottom with case reports from which they say that they provide only weak evidence. It goes up with case control studies, via cohort studies too, randomised controlled trials which are expected to give the best evidence for a particular question. So, we can say that the potential to establish causality in studies on the effect of therapy is increasing in this direction. However, please note that the increase in evidence from the bottom to the top only applies to studies on the effects of therapy or other interventions and I will go into detail later.

Slide 8

jagerslide

Just some examples of different study designs. This is the design of a case report or a case study. What’s been done is that the exposure is assessed in one patient that’s a case report or in more patients. Then it is described what the outcome was that came later or sometimes it’s reversed. There’s an assessment of the outcome and then they look back in time to assess the exposure. An example. In a series of 13 patients with chronic renal failure Casadevall described the relationship between the use of epoietin and the development of PRCA. This is an example of the discovery of a rare adverse event. This is exactly one of the strengths of a case report or a case series. Mostly it is the first form of publication for new diseases. Rare adverse events or manifestations of disease. It’s frequently done because it’s fast and inexpensive and it may generate hypotheses for further investigations. For example, after this publication a number of studies have been done to show that indeed that although there was a relationship between the use of epoietin and PRCA. However, case reports have very limited potential to make causal inferences to prove causality unless in dramatic cases and by this I mean that for example, giving penicillin for pneumonia when this was done for the first time, the effect was so striking that it was very probable that there was a relationship between the penicillin and the recovery of the patient.

Slide 9

jagerslide

Then cross-sectional studies. What is a cross-sectional study? When you do a cross-sectional study what you do is in a particular population you assess the exposure and the outcome at the same time so there is no relationship with time. Hallan and co-workers used data from population based surveys which is a type of cross-sectional study when they sort of explain the difference in the incidence of ESRD between Norway and the US and they compared the prevalence of CKD. Despite a much lower incidence of ESRD in Norway, the prevalence of CKD was similar in both countries.

Then what are the strengths and the weaknesses of a cross-sectional study? Cross-sectional studies can assess prevalence and burden of disease, again they are fast and inexpensive and they are hypothesis generating. There’s also an important weakness, there is very limited potential to make causal inferences because the time order of the exposure and the outcome cannot be determined.

Slide 10

jagerslide

What about case-control studies? What you do in a case-control study is that in a population you select cases on the basis of outcome. For example, you select cases with a particular disease and then you also select controls from that same population and then you look back to see what the exposure was in the past, in cases and in controls. An example of a case-control study is the one of Parikh and co-workers who aimed to determine pre-natal and perinatal factors associated with the development of renal agenesis. Cases were those live birth infants who had renal agenesis and the controls were a random sample of all births within a state-wide birth registry that were reported not to have renal agenesis. After adjustment for a number of factors pre-existing maternal diabetes was associated with a high risk of renal agenesis and the data also suggested a positive association with exposure to alcohol.
Now what are the strengths of case-control studies? It’s a very efficient study design especially when you look at rare outcomes, I mean renal agenesis, of course, is a very rare outcome. On the other hand, it may also be used for outcomes that take a very long time to develop. It’s relatively inexpensive, it’s also hypothesis generating and it has a somewhat higher potential to make causal inferences. The weaknesses that it can study only one outcome at a time, in this case renal agenesis and also the choice of controls is sometimes a bit difficult. You really need to think very well how you should choose your controls.

Slide 11

jagerslide

What about cohort studies? I’ve taken the example of a prospective cohort study. What you do in a cohort study is that at a particular time you compose a cohort which is free of disease and in that cohort you assess all the exposures. What you then do is after you let time run and after a particular period you assess the outcome in relation to the exposure. An example on a cohort study. De Mustert and co-workers assessed BMI at the start of dialysis and recorded that during follow up. They were able to show that in haemodialysis patients being under weight was associated with an increased risk of death. But any effect of obesity was not statistically significant. Strengths of a cohort study. Cohort studies can study multiple exposures. For example, if I do a cohort study, I can assess at the beginning of the study BMI, but also residual renal function and serum albumin etc. etc and I can assess all the effects of these factors on the outcome in the future. They are hypothesis generating and they also have some potential to make causal inferences. However, if you do them prospectively, they cost a lot of time and they also cost a lot of money.

Slide 12

jagerslide

Then there’s, of course, the randomised controlled trial. A randomised controlled trial is very similar to a cohort study the only difference is, is that at the beginning you allocate exposure by randomisation and then what you do is assess the outcome in the future, in the experimental as well as in the control group. The 4D study as an example. Wanner and co-workers used a double blind RCT to investigate whether atorvastatin was associated with less cardiovascular disease and mortality in patients with type 2 diabetes on haemodialysis. The primary end point was a composite of death from cardiac causes, fatal stroke, non-fatal myocardial infarction or non-fatal stroke, whichever occurred first. The results showed that after a median follow up period of 4 years atorvastatin had no effect on the primary outcome. Then what is the huge advantage of an RCT? And RCT is the Gold standard in making causal inferences in studies trying to show the effects of therapy and why is that the case? That is because randomisation, as they say, breaks the link between the therapy prescription by the Doctor and the prognosis of the patients. Think, for example, about the difference between the prescription of peritoneal dialysis and haemodialysis. As a doctor, you will prescribe one of those two modalities to patients with specific characteristics. So one patient is more suitable for PD and another one maybe more suitable for haemodialysis. Because you do that the allocation of the exposure, which is treatment in this case, is not by chance but it is because you influenced that allocation. It is called, the effect is called confounding by indication. The point about RCTs is that they are extremely expensive, much more expensive than cohort studies. Again, they might take a long time to complete and they are unsuitable to detect rare events. You can imagine if you only have one or two patients that will develop an event during follow up, then the group that you can study is still very small. There are also ethical issues. We could not, for example, do a randomised controlled study to study the difference in outcome between dialysis and conservative therapy just because allocation of those two is not ethical. Another important disadvantage of RCTs is that RCTs usually use very strict selection criteria. They study the effects in a very selective patient population and therefore, it is not always possible to generalise the results to the wider patient population.

Slide 13

jagerslide

Let’s go to the choice of study population and the examination methods. What you want to do in a study is that you want to examine a representative sample of an adequate size in a standardised and sufficiently valid way. What is also needed is that the methods are acceptable, if in any way possible they should be non-invasive and why? Because if they are pose an important burden on the patient and if they are invasive, then it is less likely that a study where patients are willing to participate in the study. Also, of course, they should be cheap and quick because otherwise your study will get too expensive but above all they should be standardised.

Slide 14

jagerslide

The representative sample. A representative sample is a sample that represents the population from which it was drawn on important characteristics. With regards to characteristics you should think of course about age and sex but also about ethnicity, income and educational level. The statistical inference will be more rigorous, if the selection process is random or effectively random and that means that each individual in the study population has a known, usually identical probability of selection. That is the case, if you would take consecutive patients. You just take them all, you don’t make a selection of the patients but also when you’re trying to draw a random sample from a population.

Slide 15

jagerslide

Sample size. Sample size calculation. Why would you want to calculate your sample size? That is to avoid discovering at a final analysis that numbers did not permit achievement even after the study’s primary objective.
What do you need for a sample size calculation? First of all, you need to set the alpha, the alpha is nothing else than the p-value. It represents the chance of incorrectly finding an effect that in reality is not there. Usually, as you know, we set the p-value at 0.05 or 0.1. You also need to set the beta that is 1- power and the beta is the chance of incorrectly not finding an effect that in reality is there. Usually the power is set at 0.90 or at 0.80.

The third thing you need to define is what you think is a clinically relevant difference. What effect size would you minimally want to find, if you do this particular study? So it’s the smallest differences that you hope to detect in your study. If you set 3 of them, you have also set the sample size because the sample size is directly calculated from the 3 others, it is so to say, a closed system.

Slide 16

jagerslide

However, there are different formulas for the calculation of sample sizes and they depend on the outcome. What you would need to put in a protocol is a formulation like a sample size of so many in each group will have, for example, an 80% power to detect a difference in means of so much assuming that a common standard deviation is a particular number using a 2 group T test with a 0.05 two-sided significant level. Those are standard phrases that you can use in a protocol. However, please be aware that sample size calculation depends on many assumptions and estimations and for example, if you set the p-value differently, if you set a power differently, if you have discussion on the minimally relevant effect size that all influences your sample size to a significant extent.

Slide 17

jagerslide

Then standardisation. Standardisation in physical examinations and clinical investigations. Standardisation is needed to reduce variation within and between observers. It’s easier for quantitative measurements than it is for qualitative measurements and it also includes the specification of methods for the collection, storage and analysis of lab specimens. Also for standardisation it is needed to train your staff members and it is very wise to make sure that any trading effects are completed before the start of the study. I’ll show you why. Any of you who has ever done the measurement of skin fold thickness knows that it takes some time before you are experienced this matter and that your measurement will vary over time. For example, if we have here two different observers, observer B may just in the same patient at the same moment they may have quite a different measurement of the skin fold thickness. When time passes and assuming that a patient will still stay the same, then the measurements of these two observers will have a lower difference in the long run but there will still remain a difference probably the difference between the different observers. Therefore, it’s very important that you should make sure that different patients and the different observers are linked with the subjects that are randomly allocated to the different observers.

Slide 18

jagerslide

Your measurement instruments. Very important. You should make sure that they are precise, that what they measure that those measurements are reproducible and precision refers to the agreement between repeated measurements and it says that if we repeat a measurement, would it produce the same result. If the measurement instrument is precise, it would yield the same result. It has to do with the absence of random error and if a measurement instrument is precise, it has a high inter-observer agreement and a high test-retest reproducibility.
An example, what do you think, supposing that here this is the true value of a particular measurement, would you think that this measurement instrument is precise? Precise, imprecise? Well, it produces the same results, almost the same results so this measurement instrument is precise.
The second one, is this measurement instrument precise? Yes, it is precise because it has almost the same result when measuring it a number of times. So this one is precise.
This one? No, this one is imprecise because the results are different, it’s quite a scatter in the measurement.

Slide 19

jagerslide

Then validity. Validity deals with does the measurement reflect the truth? And any deviation from the truth is called systematic error which is also called bias. An instrument with the highest validity is also called the Gold standard. Again examples.
Is this measurement instrument valid, if we assume that this here is the truth? Yes it’s valid because there’s no deviation from the truth. Is this one valid? Well, I think we can say that it’s imprecise but it is valid because on average we have the truth here. So this one is valid.
This one? This one is precise because all the measurements are very little apart but it’s far from the truth, so this measurement instrument is invalid.

Slide 20

jagerslide

Then we could say that imprecision is less than a problem than invalidity because random error can be reduced by sample size. If I just make sure that I measure thousands of patients, then the random error will go down. However, if I have an invalid measurement instrument, I will always have the same invalid error. So invalidity is really a huge problem and you should not use invalid instruments for your study.

Slide 21

jagerslide

Then what about data quality? If you collect your data, it’s wise to code them and to enter the data in duplicate to make sure that the number of mistakes is as low as possible. What you can also do is make range checks for all variables at entry. So if I would try to enter February 31st it would simply not be possible because that date does not exist. What you can also do is try to use internal consistency checks. For example, we should make sure that any date of death is later or the same as the date of start of RRT which again is later than the date of birth.
Last but not least you could pay site visits, visits to the different study centres to see whether the data in their medical records are similar to the data, are exactly the same to the data that you have in your study database.

Slide 22

jagerslide

I would like to finalise with a pilot study. In many cases if we want to do a very large study, we also want to do a pilot study. Why should we do that? Because we want to improve if needed the methodological quality. For example, we want to test the different measuring instruments in that study but also we want to check whether this study is feasible and if it turns out that in some areas there are problems, we really should try to change the organisation of the study.

Sometimes, especially funding bodies ask you to do a pilot study and to provide them with data on the size of the effect. Does any of you have an idea whether that is sensible? Is that a good idea? Is that a reason for doing a pilot study? Pilot studies by definition I think are always small studies, small compared to the study that you want to do in the future, small compared let’s say 200 compared to the 2000 patients that you want to study in the future. However, as we saw, please remember the picture of the funnel plot that I showed you, at the bottom in the area of the small studies the effect sizes are very much dependent on chance so they can vary a lot. You can only measure the size of the effect when you do the large study because in smaller studies the effect size is too dependent on chance. So even if funding bodies ask you to give an indication the size of the effect you should not do that, you should try to define what you think is a clinically relevant effect in order to calculate a sample size. So this is a commonly made mistake.

Slide 23

jagerslide

Thank you very much.