Darius A. Rastegar
Scott M. Wright contributed to an earlier version of this chapter.
General Approach
Clinicians are increasingly (and appropriately) asked to provide both scientifically sound and cost-effective medical care. These expectations have given rise to an emphasis on evidence-based medicine (EBM), which is defined as the conscientious, explicit, and judicious use of current best evidence in making decisions about the care of individual patients (1). EBM focuses on issues integral to day-to-day patient care: assessment of risks, prevention, screening, diagnosis, prognosis, treatment, and management of the increasing amount of medical information that confronts health care practitioners.
Evidence-based decision making is especially important in ambulatory practice because this is the setting where patients are most likely to present with undifferentiated problems. It is also the setting where most clinical decisions are made.
The following steps are considered to be indispensable to practicing EBM:
The importance of step 1, formulation of specific questions, can be understood by considering two similar questions that might be generated when a practitioner sees a patient with hepatitis C who asks whether antiviral therapy, which the patient has read about in the newspaper, should be initiated:
The second question is more specific and will better help to tailor the search effort (step 2) to the clinical outcomes that are most relevant to the practitioner and the patient.
The successful completion of step 2 requires efficient and effective searching skills. Most medical libraries offer brief hands-on tutorials to teach clinicians how to search databases such as MEDLINE and the Internet to find the current best evidence. The National Library of Medicine (see http://www.hopkinsbayview.org/PAMreferences) provides access to PubMed (MEDLINE) and multiple health and science databases. It also offers full-text versions of many articles, eliminating the need for additional steps to retrieve the desired manuscripts.
P.14
Step 3, critical appraisal, is likely to be most difficult and time-consuming for clinicians. The two components of this step are (a) deciding whether the results are valid and (b) deciding whether the results are relevant to the specific question being asked.
It is more efficient to resolve the second component first, which can usually be done fairly quickly. If the results are neither relevant nor clinically important, then one can avoid the time and effort spent judging the validity and quality of the information. There are numerous books and articles published in the medical literature (e.g., the Users’ Guides to the Medical Literature series published by the Journal of the American Medical Association) that aim to teach clinicians the core skills of critical appraisal. Having confidence in one's ability to critically appraise manuscripts on a wide variety of topics (e.g., diagnosis, treatment, cost-effectiveness), and which use a myriad of study designs, may take time, practice, and even additional training. Such training can be found in workshops at regional and national meetings or through medical libraries.
Step 4 involves integrating the important and valid newly found information into the care of one's patient. This step can be the most satisfying component of practicing EBM. Educating patients that a particular diagnostic approach or treatment is supported by current medical research may instill a sense of confidence about the practitioner's knowledge and expertise in finding new data. However, even after completing all these steps, choosing the best course of action is not always straightforward, and the patient's values and wishes should determine the ultimate course of action. For example, a patient may wish to forgo a treatment that may prolong their life but will likely worsen their quality of life (e.g., chemotherapy for metastatic cancer) or a patient may be unwilling to trade a short-term risk for the possibility of a long-term benefit (e.g., carotid endarterectomy for asymptomatic carotid stenosis).
It would be impractical to assume or recommend that primary care practitioners embark on these fundamental steps of EBM every time a clinical question comes up. However, when critical queries arise that are likely to recur or are particularly important to an individual patient, this version of “self-directed continuing medical education” is likely to be helpful to both practitioners and patients. Some barriers to practicing EBM include skepticism by practitioners, information overload and feeling overwhelmed by the growth of medical knowledge, lack of time, and lack of appropriate resources, skills, or motivation to implement EBM (2). Furthermore, for some clinical questions, high-quality data is lacking.
All dedicated and committed clinicians, however, practice EBM to some degree. To counterbalance the barriers to practicing EBM, the following facilitating behaviors have been proposed: (a) reading and keeping up-to-date with the medical literature (see Keeping Up); (b) refining one's EBM skills (practice makes perfect); (c) collaborating with colleagues so that valuable clinical evidence is shared among practitioners; (d) writing down specific clinical questions (step 1) when they come up so that the process can continue when time permits; (e) setting up one's computer (e.g., bookmarking relevant websites) and one's office (e.g., acquiring access to high-quality information) so as to find information efficiently; and (f) making friends with the librarian at the nearest medical library.
The remainder of this chapter discusses the core principles of EBM that apply to issues most relevant to primary care practice: diagnosis, prognosis, treatment, risk or potential harm, and cost-effectiveness. Strategies for keeping up are also discussed. Chapter 14 discusses principles that apply to prevention and screening.
Diagnosis
How Clinicians Formulate a Diagnosis
Diagnostic assessment begins the moment one meets a patient. Behavioral scientists have described at least four ways in which clinicians formulate diagnoses: pattern recognition, algorithm, exhaustion, and hypothesis-deduction.
Pattern Recognition
Many diagnoses are made instantly because clinicians have learned to recognize patterns specific to certain diseases, such as the face of a patient with Down syndrome or the elbows of a patient with psoriasis. The certainty of these types of diagnoses is so great that further testing often is unnecessary.
Algorithm
Algorithms are growing more common as a result of the growth of clinical practice guidelines, which, when grounded scientifically, can be extremely helpful. The drawbacks of algorithms are that they must be constructed before the patient is seen, and they must account for every possibility in a workup.
Exhaustion
As Sackett pointed out (see Sackett et al., Clinical Epidemiology, at http://www.hopkinsbayview.org/PAMreferences), medical students should be taught how to do a complete history and physical examination, and then be taught never to do one again. On occasion, however, clinicians do resort to comprehensive histories and examinations, as much to buy time to think as to uncover hidden disease.
P.15
Hypothesis–Deduction
Clinicians usually diagnose by forming hypotheses and testing them, as is done in scientific experimentation. On hearing that a patient has chest pain, the practitioner builds a short list of hypotheses, invites further description, and then asks focused questions that help confirm or rule out the hypotheses. The questions in the interview and each maneuver in the examination are as much diagnostic tests as the electrocardiogram or the chest radiograph. Studies of clinicians’ behavior reveal that the short list of hypotheses usually does not exceed three or four diagnoses. Typically, new hypotheses are added as others are discarded, but the eventual goal is to narrow the list and reduce the uncertainty about which diagnosis is most likely. Studies of clinicians in ambulatory practice showed that hypotheses were generated, on average, 28 seconds into the interview, and that correct diagnoses of standard problems were made 6 minutes into 30-minute workups; the correct diagnoses were made in 75% of the encounters (3).
The hypothesis–deduction model reveals a truth common to all methods of diagnosis: Rarely can a clinician be absolutely certain of any diagnosis. Clinicians live with uncertainty, and the role of all diagnostic tests—the interview, the physical examination, the laboratory evaluation, trials of empiric treatments, allowing time to pass (expectant observation)—is to narrow the uncertainty enough to place a diagnostic label on a patient's problem. How narrow the uncertainty must be depends on the practitioner's and the patient's tolerance of uncertainty, the severity of the suspected disease, the “treatability” of the suspected disease, and the benefits and risks of possible treatments.
Steps in the Hypothesis–Deduction Process
Evidence shows that clinicians implicitly use common sense and their medical knowledge to reach a diagnosis with adequate certainty. Explicitly, the diagnostic process follows certain steps.
Step 1: Form a Hypothesis and Estimate Its Likelihood
The estimate of likelihood is called the pretest probability (or prior probability); it simply represents the estimate of prevalence of the disease in a group of people similar to the patient at hand. Each hypothesized diagnosis and the estimate of its likelihood comes initially from evidence collected during the interview and physical examination and from the practitioner's fund of knowledge from sources such as other patients, colleagues, textbooks, and journals. More recently, computer programs have been developed to aid clinicians in making this estimate; these programs have the potential to become a powerful tool in clinical decision making.
Step 2: Decide How Certain the Diagnosis Must Be
If the hypothesized disease is easily and safely treated, one might have to be less certain than if the disease has an ominous prognosis or demands complex, risk-laden treatment. For example, a 75% certainty that a patient has streptococcal pharyngitis might be sufficient to prescribe an antibiotic, whereas a much higher level of certainty is needed before diagnosing and treating a patient with suspected leukemia. If the pretest probability is above the threshold for a hypothesized disease (e.g., greater than 75% for streptococcal pharyngitis), further tests are unnecessary and treatment is prescribed. Conversely, if one is adequately certain that the patient does not have the hypothesized disease (e.g., 90% probability that the patient does not have streptococcal pharyngitis), no further tests are required and the patient can be reassured and educated. However, if the level of uncertainty remains between these two extremes, further testing (e.g., a throat culture) can help move the case toward one extreme or the other. Diagnostic testing usually is most helpful between the two extremes of certainty, whereas further testing generally has little impact on the posttest probability if the pretest probability is very high or very low.
Step 3: Choose a Diagnostic Test
Which test to choose depends on many factors, including its safety, its accuracy (e.g., how closely an observation or a test result reflects the true clinical state of a patient), how easily it can be done, its cost, and, not least, the patient's preferences and values regarding tests, especially those that carry risks. Accuracy includes both reliability and validity. Reliability of a test, also called reproducibility or precision, is the extent to which repeated measurements of a stable phenomenon give results close to one another. Validity is the degree to which a test measures what it is supposed to measure. A test can be reliable but not valid (i.e., it reliably measures the wrong phenomenon), or it can be valid but not reliable (i.e., it measures the phenomenon of interest, but with wide scatter).
When considering a test, one needs to reflect on each of these factors. Table 2.1 summarizes practical guidelines to assess and critically appraise reported studies of diagnostic tests. When selecting a test for a patient, the crucial questions to ask are, “Will the results of the test change my plan?” and “Will my patient be better off from having had the test?” (the utility of the test). If the answer to these questions is “No,” the test should not be performed.
TABLE 2.1 Guidelines for Assessing a Study of a Diagnostic Test |
||
|
P.16
Step 4: Be Aware of the Test's Performance Characteristics
Every diagnostic test has a sensitivity and specificity for each disease it tests for. Sensitivity and specificity have become common terms in medical discussion, but they are often misunderstood. The sensitivity of a test (the true positive rate) is equal to the number of study subjects with a given disease who have a positive test divided by all study subjects with the disease. The specificity of a test (the true negative rate) is the number of study subjects without the disease who have a negative test divided by all those without the disease. The 2 × 2 table in Fig. 2.1 reveals much about these and related terms.
FIGURE 2.1. Test performance determined by research. Researcher identifies diseased and nondiseased patients using a gold standard and then determines the performance characteristics (sensitivity and specificity) of another test. Example: Iron deficiency anemia determination using bone marrow aspirate/biopsy as the gold standard and serum ferritin measurement as the screening test. (Data from Guyatt GH, Patterson C, Ali M, et al. Diagnosis of iron deficiency in the elderly. Am J Med 1990;88:205 .) |
Tests with high sensitivity have a low false-negative rate and are useful for “ruling out” a diagnosis (when they are negative). Conversely, tests with high specificity have a low false-positive rate and are useful for “ruling in” a diagnosis (when they are positive). One way of remembering this is with the mnemonics SnNOut (high sensitivity, negative result rules out) and SpPIn (high specificity, positive result rules in). However, it should be pointed out that these rules of thumb do not always hold up in actual practice; the ability of a sensitive test to rule out a diagnosis is reduced when the specificity is low (4).
“Diseased” and “not diseased” are labels that reflect a best test or a definition of a certain disease: the so-called gold standard. For pulmonary embolus, for example, the gold standard is the pulmonary angiogram. For angina, there is no sure test, so a case definition becomes the gold standard. Skepticism must be used in evaluating gold
P.17
standards, for they often have their own limitations. For example, when gallbladder ultrasonography was tested for use in the diagnosis of cholelithiasis, it initially seemed to be a poor test in comparison with the gold standard (oral cholecystogram), not because of problems with the new test, but because, as was later shown, the gold standard was itself a poor test (5). Studies of diagnostic testing may have other problems, including verification bias (when those with a positive test result are more likely to have further evaluation), spectrum bias (when the population tested does not reflect those in whom the test will be used), and incorporation bias (when the results of the test under study are included among criteria to establish the reference standard).
TABLE 2.2 Tradeoff between Sensitivity and Specificity when Diagnosing Iron-Deficiency Anemia: Likelihood Ratios |
||||||||||||||||||||||||||||||||||||||||||
|
Sensitivity and specificity are not static properties of a test. As the cutoff value for an abnormal result is made more extreme, the test's sensitivity decreases and its specificity increases. Table 2.2, where progressively lower ferritin levels are used to characterize elderly patients as having iron-deficiency anemia (IDA), illustrates this principle (6). This illustration matches the common-sense conclusion that as a patient's test result becomes more abnormal, one can be more certain that the patient has disease—although never fully certain. If one selects a very low ferritin level for the cutoff between normal and abnormal (Table 2.2), many iron-deficient people will remain undiagnosed (i.e., the sensitivity will be low), but almost all of those diagnosed will be truly iron deficient (i.e., the specificity will be high). Conversely, if one decides to label patients as having IDA based on a ferritin level well within the normal range (e.g., 75 µg/L), one will not miss much disease (higher sensitivity), but will falsely label numerous anemic patients as being iron deficient who are not (lower specificity). When interpreting the results of a test, a clinician must consider the severity of disease, the potential risks and benefits of treatment, and changing information about the risks and benefits of treatment.
Another way of showing the relationship (and trade-off) between sensitivity and specificity is to plot a receiver operating curve (ROC); the true-positive rate (sensitivity) is plotted on the vertical axis and the false-positive rate (1 - specificity) on the horizontal axis. Figure 2.2shows a plot of the values provided in Table 2.2. Receiver operating curves can be a useful tool to compare different diagnostic tests; in general, the closer the curve is to the left-upper-hand corner (100% sensitivity and specificity), the better the test performs.
Step 5: Determine a Posttest Probability of Disease
The perfect test (100% sensitivity and specificity) would yield a “yes” or “no” answer to the question “Does my patient have disease or not?” However, because no test is perfect, the more appropriate question is: “Given the result of this test, what is the posttest probability that my patient has (or does not have) disease?” Posttest probability takes into account both the performance characteristics (sensitivity and specificity) of the test and the pretest (prior) probability of disease in a group of patients similar to the patient in question.
FIGURE 2.2. Receiver operating curve (ROC) of serum ferritin for iron-deficiency anemia. |
P.18
One method for determining posttest probability is through the use of predictive values. Predictive values can be calculated from the known sensitivity and specificity of a test and the estimated pretest probability of disease. Sensitivity and specificity are generally transferable from study to practice settings, provided the diseased and nondiseased populations in the study and in the practice settings are similar. Sensitivity and specificity usually are not influenced by the prevalence, or pretest probability, of disease. However, predictive values must be recalculated for each patient or population from the estimated pretest probability or prevalence of disease in that particular group.Positive predictive value is the probability of disease in a patient who has an abnormal test result. Negative predictive value is the probability of no disease in a patient for whom a test result is normal. Figure 2.1 illustrates the calculation of posttest probability, based on pretest probability, sensitivity, and specificity.
The lower the pretest probability of disease, the lower the positive predictive value of a test, the lower the posttest probability of disease, and the more likely it is that a positive test result is falsely positive. This influence of pretest probability on posttest probability makes intuitive sense. For example, when a seasoned clinician encounters an unexpected positive test result in a patient with a very low likelihood of disease, the clinician is suspicious of the finding and either repeats the test, suspecting laboratory error, or orders another, more specific test to confirm or refute the finding.
Published information is available that can be helpful in estimating pretest probability, and therefore the predictive value of test results, in patients with selected characteristics. Examples of how such information can be used to interpret test results and determine diagnostic strategies are illustrated elsewhere in this book for deep vein thrombosis (see Chapter 57) and renovascular hypertension (see Chapter 67).
Another method of calculating posttest probability of disease is through the use of a likelihood ratio (LR). This number combines the relationships of sensitivity and specificity into a single number. The positive LR (+LR) is the true-positive rate (sensitivity) divided by thefalse-positive rate (1 -specificity), and the negative LR (–LR) is the false-negative rate (1 - sensitivity) divided by the true-negative rate(specificity). LR ranges from 0 to infinity; when the positive LR is between 0 and 1, a positive test result decreases the posttest probability; when it is >1, it increases the posttest probability; a LR of 1 does not change the posttest probability (i.e., the test is not useful).
There are a few ways of using the LR to calculate posttest probabilities. The standard method is to convert pretest probability into an odds ratio, multiply this by the likelihood ratio to determine posttest odds ratio, and then convert the posttest odds ratio to a probability:
FIGURE 2.3. Nomogram for interpreting test results using likelihood ratios. Example from text: An elderly male patient with anemia has a pretest probability of having IDA equal to 33%. His serum ferritin level is 33 µg/L, which is associated with a positive LR of 14.3. Extending a straight line through the pretest probability of 33% and the LR of 14.3 results in a posttest probability of 88%. (Adapted from Fagan TJ. Nomogram for Bayes’ theorem. N Engl J Med 1975;293:257 .) |
Another method is to use a nomogram (Fig. 2.3) that allows conversion of pretest to posttest probabilities, given a known LR, without having to convert back and forth between probabilities and odds. This alternative is quick, is easy to use, and decreases the chances of calculation error.
However, converting probabilities to odds ratios and back can be cumbersome, and most of us do not carry nomograms in our pockets. For this reason, it may be simpler to use a method of estimating posttest probabilities (7). This method is fairly accurate when the pretest probability is between 10% and 90% (i.e., neither very high nor very low). Table 2.3 summarizes the approximate change
P.19
in probability associated with a range of LRs. One can simply remember that positive LRs of 2, 5, 10 are associated with approximate posttest probability increases of 15%, 30% and 45% respectively. Conversely, LRs of 1/2 (0.5), 1/5 (0.2), and 1/10 (0.1) decrease the posttest probability by 15%, 30% and 45% respectively.
TABLE 2.3 Simplified Posttest Probability Estimates Based on Likelihood Ratio* |
||||||||||||||||||
|
For example, suppose the clinician is faced with a 67-year-old male patient who has increasing fatigue and is found to be anemic. Knowing that the baseline prevalence (pretest probability) of IDA among anemic elderly patients is 31% (6), one might consider this man's pretest probability of IDA to be approximately 33%, for an odds of 1:2, or 0.5. If the serum ferritin is 33 µg/L, we can see from Table 2.2 that when a cutoff of <35 µg/L is used, the positive LR is 14.3:
So the odds of the patient having IDA based on this test result are 7:1. Converting back to probability, the patient has a posttest probability of IDA of about 7 ÷ (1 + 7) = 7 ÷ 8 = 88%. Given this posttest probability, further diagnostic workup (e.g., colonoscopy) to identify the cause of the IDA is appropriate.
Using the nomogram and a straightedge, the posttest probability is approximately 85%. Finally, if we use the simplified estimation method outlined earlier, we know that the likelihood ratio is >10; consequently, we should add at least 45% to the pretest probability of 31%, yielding a posttest probability of >76%, which is probably close enough to the actual value to help us in our decision making.
Prognosis
Often, the information that is most important to a patient who has a new diagnosis is the prognosis (“What is going to happen to me?”). In choosing therapy, one decides what one can do for the patient's disease. Yet, predicting what will happen to a particular patient usually is not possible, and clinicians must rely on probabilities. Sometimes, specific characteristics (“prognostic factors”) such as demographic factors, disease-specific factors, and comorbidities can help further delineate a patient's prognosis. Clinical prediction rules that take these factors into account can help practitioners arrive at more accurate estimates of prognosis.
Prognosis can be addressed in two ways: the natural history of a disease and the clinical course of a disease. Because few diseases today progress without medical intervention, less is being learned about natural history and more is being learned about clinical course. For example, the natural history of diabetes in the late 20th century is unknown because virtually no diagnosed patients go without some type of therapy, yet through many studies, more is known about the course of treated diabetes.
Most information about prognosis comes from prospective cohort studies in which patients with a disease are monitored over time. Cohort studies may include only untreated subjects (natural history of a disease), only treated subjects, or a combination of both treated and untreated subjects (clinical course of a disease). Cohort studies are simple in design, yet they are often costly in time and money. They are susceptible to biases, such as sampling bias, in which the group of patients being monitored is not representative of all patients with that condition. Table 2.4 summarizes suggested guidelines for assessing studies of prognosis.
Treatment
Once a diagnosis is made, treatment becomes the focus of care. Before embarking on a treatment plan, one must decide on the goals of treatment (to cure, delay complications, relieve acute distress, reassure, or comfort). Clearly, more than one goal may be chosen. For example, when
P.20
diagnosing and treating type 2 diabetes, one may seek to cure (counsel weight loss and exercise), to delay or prevent complications (seek tight glucose control), and to relieve distress, reassure, and comfort (listen to the patient's fears, reassure the patient that diabetes is a treatable disease and that he or she will not be abandoned).
TABLE 2.4 Guidelines for Assessing a Study of Prognosis |
||
|
Once the goals have been set, treatments are chosen. Unfortunately, many treatments have never been tested scientifically in ways that answer the questions that are of interest to clinicians and their patients (e.g., probability of benefit, size of benefit, onset time and duration of response, frequency of complications of treatment), and many aspects of treatment are difficult to measure through scientific experiments. Fortunately, drugs and procedures are increasingly being subjected to clinical trials, and measures of quality of life are being included in the evaluation of therapies.
The clinical trial is the current standard for assessment of drugs and therapeutic procedures. The strongest clinical trials are randomized, double-blinded controlled trials. The strength of a randomized controlled trial (RCT) is that the study groups are likely to be similar with respect to known determinants of outcome, as well as those determinants that are unknown. However, randomization is often difficult to accomplish in the real world, where patients are free to join or refuse to join a clinical trial and where money to support research is limited. Theoretically, in a trial that is double blinded (meaning that neither the patient nor the researcher knows who is receiving the experimental treatment), the researchers’ and patients’ assessment of outcome is not biased by prior knowledge of their assignment (e.g., to placebo or to active treatment). However, studies may not be truly blinded; for example, in a trial of β-blockers against placebo, patients and clinicians can measure pulse rates. Nonetheless, the clinical trial is the least-biased method currently available for researchers to test how well drugs and other interventions work in ideal situations (efficacy) and in the real world (effectiveness). Table 2.5 lists guidelines that clinicians can use when assessing the results of a clinical trial. As illustrated in the table, there are important questions to ask of a clinical trial that reports benefits to treated subjects. Were clinically relevant outcomes, such as measures of patient health (e.g., morbid events, functional status) reported, and not just surrogate end points (e.g., reduction of blood pressure)? Was all-cause mortality, not just mortality caused by the disease in question (e.g., colon cancer), reported? In addition to reporting the statistical significance of findings (the probability that the findings are true), did the study discuss or clarify the clinical significance of the findings (whether the benefits were clinically meaningful)? As the size of a study increases, there is an increased likelihood that clinically small or nonmeaningful benefits, which are nonetheless statistically significant, will be demonstrated.
TABLE 2.5 Guidelines for Assessing a Study of Treatment (Clinical Trials) |
||
|
Moreover, one must pay close attention to the followup of the subjects enrolled in trials; intention-to-treat analysis is a strategy for analyzing data in which all study participants are analyzed in the group to which they were assigned, regardless of whether they dropped out, were noncompliant, or crossed over to another treatment or nontreatment group. Such an analysis may weaken the ability of a study to demonstrate the effect of a treatment, but it prevents selection biases caused by differences in participants who drop out from a treatment compared with those who remain.
Researchers often report treatment outcomes in terms of the relative risk reduction (RRR), which is the difference in the event rate between control and experimental groups of patients expressed as a proportion of the event rate in the control group: RRR = (control event rate - experimental event rate) ÷ control event rate. The difference between the control and experimental event rates is the absolute risk reduction (ARR): ARR = control event rate - experimental event rate. RRR can alternatively be expressed as the ARR divided by the control event rate: RRR = ARR ÷ control event rate. RRR is only meaningful in the context of absolute risk and can be misleading when applied to individual patients. If someone is at very low risk for an adverse outcome, a treatment with even a high RRR will have negligible effect on their absolute risk. On the other hand, for someone who is at high risk for an adverse event, even a small RRR can have a significant impact on their absolute risk. One method of incorporating absolute risk into an assessment of an intervention's impact, besides stating ARR, is to calculate the number needed to treat (NNT). This refers to the number of persons who need to be treated for one person to benefit and is a more useful measure for a clinician than the RRR. The calculation for NNT is simply 100%/ARR, with ARR expressed as a percentage, or 1/ARR, with ARR expressed as a fraction (e.g., 0.10 for an ARR of 10%).
These concepts can be illustrated using the results of two trials of beta-hydroxy-beta-methylglutaryl-coenzyme A (HMG-CoA) reductase inhibitors (“statins”) for the
P.21
prevention of myocardial infarction. The Scandinavian Simvastatin Survival Study (4S) included subjects with high cholesterol levels and a history of coronary heart disease (8). In contrast, the Air Force/Texas Coronary Atherosclerosis Prevention Study (AFCAPS/TexCAPS) trial included a lower-risk group of individuals with average cholesterol levels and no known heart disease (9). Table 2.6 provides the rates of myocardial infarction (fatal and nonfatal) in each trial and shows how to calculate the RRR, ARR, and NNT. Although treatment with a statin in both trials yielded similar relative risk reductions (≈40%), the absolute risk reductions and numbers needed to treat are quite different. This illustrates the importance of understanding an individual's risk when trying to gauge the impact of a therapeutic intervention; a practitioner (on average) would need to treat 83 patients with average cholesterol levels and no history of heart disease with a statin for 5 years to prevent a myocardial infarction, while only 12 patients with high cholesterol levels and heart disease would need to be treated to prevent one event.
TABLE 2.6 Use of Data to Estimate Clinical Consequences of Treatment: Comparison of Two Trials |
|||||||||||||||||||||||||||||
|
There are a few caveats about clinical trials. Although the RCT is the best study design for assessing the value of a treatment, one should be cautious about relying on the results of any single study, even one that was done well. Systematic reviews and meta-analyses, which combine the results a number of studies, are discussed later in this chapter (see Keeping Up). Sometimes clinical trials have not been performed. In this situation, the clinician may need to rely on cohort, case-control, or cross-sectional studies. These types of studies are more commonly used to assess risk or harm and are discussed in the next section.
Risk or Potential Harm
Practitioners are frequently called on to make assessments and judgments regarding risk or potential harm resulting from either medical interventions or environmental exposures. Table 2.7 summarizes some of the guidelines for assessing evidence of harm. Ideally, these questions would be answered in a RCT; however, for obvious ethical reasons, RCTs are not undertaken with the intent of studying a harmful exposure. Sometimes, a potentially beneficial intervention is unexpectedly found to be harmful in a clinical trial, or there may be both benefits and harms associated with the intervention.
More commonly, harm is addressed through observational studies. One kind of observational study is a cohort study, in which exposed and unexposed patients are identified and monitored for a period of time, and outcomes in the two groups are compared. For example, a cohort of cigarette smokers and nonsmokers could be monitored and the incidence of lung cancer in both groups measured. In these studies, the two groups may be different with respect to important determinants of outcome other than the exposure being studied (confounding variables). Researchers often can statistically adjust for these factors,
P.22
but there may be other contributing factors of which they are unaware.
TABLE 2.7 Guidelines for Assessing a Study of Harm |
||
|
Another method of assessing harm is through case-control studies. In these studies, patients with an outcome of interest (cases) are identified and compared with others who are similar in respects other than the outcome (controls). Exposure rates in the case and control groups are then compared to look at the association between the exposure and the outcome. For example, the smoking rate in a group of patients with lung cancer may be compared with a group of patients without lung cancer who are otherwise similar. These studies are subject to recall bias: patients with an illness may be more likely to recall or report an unusual exposure than those who are not ill. In addition, like cohort studies, they are limited by the possibility of differences in unidentified risk factors between the groups.
Cohort and case-control studies can also be used to assess potentially beneficial associations, as was done in studies that suggested a cardiovascular benefit of hormone replacement therapy. However, this benefit was not demonstrated when studied in an RCT (10), calling the purported benefit into question and highlighting the limitations of observational data.
Weaker designs for identifying risk or harm include cross-sectional studies, case series, and case reports. Cross-sectional studies can establish associations but not causal links. They are strengthened by statistical methods that control for confounding variables (potential determinants of harm other than the one of concern). Temporal relationships, however, are usually not established. In case reports or case series, adverse outcomes associated with a particular exposure are reported in a single patient or group of patients. These reports are useful for identifying potentially harmful exposures to be studied further, but they are weak evidence for a causal relationship by themselves. However, if the outcome is very harmful and otherwise rare, this kind of evidence may be sufficient to take action. This might occur, for example, when severe adverse reactions associated with a particular medication are reported, especially if safer alternatives exist. A recent example is troglitazone, which was taken off the market after case reports of severe hepatotoxicity associated with its use.
Cost-Effectiveness
In ambulatory practice, cost considerations arise frequently. Cost-effectiveness analyses evaluate health care outcomes in relation to cost. The primary goals are to determine the most efficient use of resources and to minimize the costs associated with the achievement of health goals and objectives. A common strategy for cost-effectiveness studies is to compare a novel approach or therapy with the current practice or standard of care. The time frame of the study should be long enough to allow for costs and long-term benefits to be realized. The perspective of the analysis takes into account who benefits from the intervention as well as who pays for it (society, the payer, or the patient). Cost-effectiveness analyses often rely on a number of assumptions, and small variations in one or more of these parameters can have a significant effect on the conclusions; a sensitivity analysis can help determine how sensitive the outcomes are to changes in the parameters.
TABLE 2.8 Guidelines for Assessing a Study with an Economic Analysis of Clinical Practice |
||
|
Whether decisions are being made for a population (e.g., frequency of screening colonoscopy, drugs to be added to a formulary) or for a particular patient (e.g., choice of antihypertensive medicine), the potential benefits should be weighed against the resources used and money spent. Table 2.8 summarizes some guidelines for assessing evidence in studies performing economic analyses.
In cost-effectiveness analyses, costs usually are measured in monetary units (e.g., dollars) and a single clinical outcome is considered (e.g., mortality). In cost-utility analyses, multiple clinical outcomes, including quality of life, are represented and result in the calculation of “quality-adjusted life years (QALY).” In both types of analyses, alternative diagnostic or therapeutic approaches are studied with a primary emphasis placed on economic considerations.
Keeping Up
One of the major challenges to clinicians is keeping one's personal fund of medical knowledge current. Studies suggest that older practitioners are often “out of date” and tend to provide lower-quality care (11). For primary care practitioners who are expected to know about a wide array of clinical topics, keeping up-to-date can be particularly difficult. It has been suggested that each practitioner should develop a personal mission as to the extent of
P.23
“up-to-datedness” he or she hopes to achieve and maintain. Two questions that may help to better define this territory are (a) “What information do I need to have in my head to be satisfied with my knowledge base for the performance of my job?” and (b) “What information would I be embarrassed not to know?” (12).
One author estimated that if clinicians tried to keep up with the medical literature by reading one article each day, they would be 55 centuries behind in their reading after 1 year (see Sackett et al., Clinical Epidemiology, at http://www.hopkinsbayview.org/PAMreferences). In a seminal study, experienced clinicians in ambulatory practice said they had about two clinical questions per week that went unanswered; however, when shadowed in day-to-day practice, they were found to actually have about two unanswered questions for every three patients seen (13). Moreover, although these clinicians said that their main sources of information were textbooks and journals, their behavior showed that they got most of their clinical information from colleagues and drug retailers. Fortunately, in ambulatory medicine, some high-quality secondary or abstracting publications exist that produce abstracts and often provide expert commentary on clinical articles believed to be of particular importance (approximately 2% to 3% of articles screened from hundreds of journals) (14). Examples are the ACP Journal Club and Evidence-Based Medicine.
Scheduling of time to obtain and find relevant reading material is a critical step in keeping up-to-date. The actual reading of the pulled material can occur either in the scheduled time or when a lull presents itself (e.g., a patient no-show). Proactive scanning or browsing through a small number of peer-reviewed journals that regularly yield articles relevant to one's clinical practice is an integral part of keeping up. Reactive learning (also called problem-focused learning) is stimulated by clinical encounters or questions from patients or medical learners and requires searching to find the appropriate materials (steps 1 and 2 of the core EBM skills described at the beginning of this chapter). Sackett described the “educational prescription” as a means of phrasing and keeping track of questions as they arise with the goal and intent of searching for the best available evidence to answer these queries at some time later. A combination of proactive and reactive approaches is thought to represent the ideal balance for dealing with the evolution of medical knowledge. Several additional ideas have been suggested by authors who have pondered the challenge of keeping clinically up-to-date (Table 2.9) (15).
Although original research articles continue to be an excellent source for new information, other types of publications can also be helpful in the quest to stay current. One common source of medical information is the overview. The chapters of this book (and other textbooks) are one example of an overview; review articles in medical journals are another. These types of overviews are easy to access (especially if the textbook is at hand) and easy to use; they require little work or effort to obtain needed information. However, they are limited by the biases and limitations of the authors and typically do not explain how the information was gathered or how conclusions were reached.
TABLE 2.9 Elements of an Information Plan |
||
|
Systematic reviews and meta-analyses published in peer-reviewed journals with detailed methods describing specifically the literature search and the inclusion/exclusion criteria of the original articles can be invaluable. Critical appraisal methods for these two article types have been developed and can be applied to evaluate the quality of the work (16); Table 2.10 summarizes these methods. Some of the limitations that need to be considered
P.24
include the heterogeneity of studies (with regard to populations studied and outcomes assessed) and the fact that small studies with negative results are less likely to be published than those with positive results (publication bias). Authors often try to correct for these limitations, but meta-analyses have sometimes yielded results and conclusions that were discordant with subsequent large RCTs (17). Nevertheless, meta-analysis can be a powerful tool to synthesize the available evidence in an unbiased fashion. In addition to those published in medical journals, the Cochrane Collaboration (and the Cochrane Library— see http://www.hopkinsbayview.org/PAMreferences) represents an international endeavor to develop, maintain, and disseminate systematic reviews on clinical and health-related topics.
TABLE 2.10 Guidelines for Assessing a Review Article |
||
|
Guidelines are systematically developed statements that offer recommendations to assist with decision making in specific situations. It has been found that clinicians often do not employ effective interventions (e.g., prescribing beta-blockers to patients after a myocardial infarction). Guidelines serve the dual purpose of offering easily accessible recommendations for practitioners and publicizing these recommendations to practitioners and the general public. Guidelines typically are developed by expert panels. They are best when they employ explicit criteria for gathering the evidence and making recommendations and acknowledge the level of evidence for each recommendation. Guidelines may be biased by the composition of the expert panel, and sometimes conflicting guidelines are disseminated by different organizations. For example, the American Urological Association recommends offering prostate-specific antigen (PSA) determinations to screen for prostate cancer, whereas the United States Preventive Services Task Force does not. Table 2.11 lists some suggestions for evaluating practice guidelines.
Each information source has strengths and weaknesses. Colleagues may be misinformed. Drug retailers have a product to sell, making them biased. Textbooks are often out of date by the time they are printed. Traditional continuing medical education courses provide variable degrees of evidence-based education and have been shown to have little effect on practice.
TABLE 2.11 Guidelines for Assessing a Practice Guideline |
||
|
Because “keeping up” with the medical literature represents a colossal challenge, some authors have provided some direction for how to optimize the chance that one's time investment will result in a reasonable return (18,19). They suggest that the usefulness of medical information for a given provider is proportional to its relevance, validity, and accessibility. Relevance relates to the frequency with which the provider encounters the topic. Validity refers to the quality of the information and the likelihood that the information is true. Accessibilityconnotes the ease with which the information source can be retrieved. These authors recommend that practitioners seek out information sources that are relevant, valid, and easily accessible.
Finally, medical librarians can be extraordinary helpful in keeping clinicians in touch with changes in the medical literature, and they most are happy to meet with clinicians to make them aware of new resources. Befriending one's medical librarian is a critical component of a “keeping up” strategy and can pay huge dividends in the pursuit of evidence-based medical practice.
Specific References*
For annotated General References and resources related to this chapter, visit http://www.hopkinsbayview.org/PAMreferences.
P.25