Chapter 2
1. At first glance this study might appear to be cross-sectional because a telephone interview was used. However, the data collected pertained to exposures that occurred prior to development of the disease (for the cases). Therefore, this study is best described as case–control.
2. This is not a classic crossover clinical trial because patients who received the therapy were not subsequently given a placebo. Why would a modified crossover study be necessary in this study?
3. This study design is interesting, and some might be tempted to call it case–control. The purpose was to compare self-reported therapy with actual therapy. Both of these measures (self-reported and actual) occurred at the same time, so the study also has some characteristics of a cross-sectional study. Our opinion is that the study is case–control, but it illustrates the difficulty in classifying studies cleanly into any scheme.
4. The study is observational and evaluates the clinical experiences of students who felt competent and those who did not feel competent in performing cancer screening exams. Because the direction of the question is looking back to learn about clinical experiences, the study is a case–control study.
5. The study is a placebo- (diuretic) controlled randomized clinical trial. In actuality, patients were randomly assigned to different regimens, although we did not provide that information in the description.
6. The group of subjects was identified and initial data collection occurred in 1976; the same subjects were followed up in future years. The study design is therefore best described as a cohort or prospective study to identify risk factors.
7. This study collects two kinds of information (physical signs and a laboratory procedure) to compare the diagnostic accuracy of the physical signs. Although there may be a few days between the signs and the result of the lumbar puncture, the research question is the relationship between the two. Therefore, this is a typical case–control study.
8. This study examines the relationship between risk factors (provider volume) and outcomes in a group of patients undergoing surgery. The research question was clearly forward—what will happen—although the investigators used an existing database. We therefore classify this as a historic cohort study.
9. The best design is cohort study, which follows a group of subjects over time to see whether lung cancer develops more often in those with a positive screening test; however, this study design would take several years to complete. To avoid the extended time period, these investigators collected data on patients with and without lung cancer and looked back at their screening test results. Because the direction of the question is backward, the study would be best called case–control.
10. Several study designs are possible, but the most realistic is an observational study. A case–control study provides information faster, but several case–control studies are required; a group of cases for each cause of death and type of cardiovascular morbidity needs to be identified. We suggest you use one of the electronic search programs to see if you can locate a copy of a cohort study, and, if so, read and discuss the pros and cons of how the study was done.
Chapter 3
1. The sum of the deviations (-0.08, -0.13, -0.17, …, 0.60, 0.16, -0.25) is zero.
2.
a. Using the Descriptive Statistics program in NCSS or the Descriptives program in SPSS, the mean heart rate variation for the 580 subjects was 49.7 with a standard deviation of 23.4.
b. See Table B-1.
c. See Figure B-1.
Table B-1. Frequency distribution of heart rate variation (RR_VAR). |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
3.
4.
a. The lower norm is the value represented by the 14.5th person [580 subjects × 0.025 (the 2˝ percentile) = 14.5]. The upper norm is the value represented by the 565.5th subject [580 × 0.975 (the 97˝ percentile) = 565.5]. We generated a stem-and-leaf plot and then counted in from both ends to obtain approximately 12 for the lower norm and 103 for the upper norm (Box B-1). Alternatively, you might generate a frequency distribution and find the values defined by the 2˝ and 97˝ percentiles. These values differ only slightly from the values of 12.8 and 103.5 published by the authors; the discrepancy is due to interpolation by the authors.
|
Figure B-1. Frequency table of heart rate variation to deep breathing. (Data, used with permission, from Gelber DA, Pfeifer M, Dawson B, Schumer M: Cardiovascular autonomic nervous system tests: Determination of normative values and effect of confounding variables. J Auton Nerv Syst 1997;62:40-44. Plot produced with NCSS; used with permission.) |
5. Age was available for 490 patients. The mean age must be estimated using the weighted means formula. It is found by multiplying the midpoint of each age interval by the count:
The mean heart rate variation is estimated by multiplying the count by the mean in each age group:
The means calculated from the raw data are 34.5 years and 50.2 heart rate variation. The heart rate variation is a very good estimate because the means in each age group are used. The age is a good estimate because this table follows the rules for good frequency construction by choosing class limits so that most of the observations in the class are closer to the midpoint of the class than to either end of the class.
6. See Table B-2. The odds ratio is 1.90 and indicates the risk of hematuria is almost twice as great if a patient had RBC units > 5.
7.
a. Bimodal with one peak in the early 20s and with a smaller peak in the 40s and 50s.
Table B-2. Contingency table for red blood cell count and hematuria and odds ratio. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
8.
9.
10.
11.
a. Skewed positively, with many physicians delivering 30-60 babies (family or general practitioners) and fewer delivering 200-250 (obstetricians); the distribution might also be bimodal with one mode at 50 babies and another at 225.
b. Probably bell-shaped, with a few hospitals referring a very small number of patients and a few referring a very large number of patients, but with the majority referring moderate numbers of patients.
12. See Figure B-2.
13.
14.
a. Both exposure and outcome are nominal, and the study used a case–control design, so the odds ratio would be appropriate to look at the relationship between exposure and cryptosporidiosis.
b. Both exposure and reaction are nominal variable. The study was a clinical trial, so the relative risk is appropriate, as are the absolute risk reduction and the number needed to treat.
c. Generally, characteristics such as one's assessment of competency is measured using an ordinal scale that might have five or so divisions, such as 1 = not competent to 5 = very competent. In this situation, the median is the preferred statistic.
|
Figure B-2. A larger percentage of women had very low scores at both times, but, overall, the distributions are very similar. Both men and women had higher scores at time 3. (Data, used with permission, from Hébert R, Brayne C, Spiegelhalter D: Incidence of functional decline and improvement in a community-dwelling very elderly population. Am J Epidemiol1997;145:935-944. Graphs produced using SPSS, used with permission.) |
d. Duration therapy is a numerical variable and occurrence of coronary heart disease is nominal. Therefore, a histogram, box plot, or error bar plot would all be appropriate.
e. Both provider volume (number of cases) and complication rates (number of complications in a given period) are numerical scales. A scatterplot would be very effective in demonstrating the relationship between provider volume and complication rate.
15.
a. Salaries probably do not follow a bell-shaped curve; they tend to have a positive skew, with a few physicians making relatively larger salaries. The median and either the range or the interquartile range are therefore best.
b. Standardized ability and achievement tests administered to large numbers of examinees tend to follow a bell-shaped curve; therefore, the mean and the standard deviation are appropriate.
c. A bell-shaped distribution is a reasonable assumption, so the mean and the standard deviation are used.
d. The number of tender joints probably has a positively skewed distribution, so the median and the range or interquartile range are appropriate.
e. Presence of diarrhea either occurs or does not; therefore, this is a nominal characteristic, and proportions or percentages are correct.
f. This ordinal scale calls for the use of the median and the range; alternatively, proportions or percentages may be used.
g. Somewhat negatively skewed, with the majority of females developing the disease at ages 50-70 years, so the median and range are appropriate.
h. Assuming compliance is fairly good, the distribution has a positive skew, with most patients having a low pill count, and the median and range may be used.
16. The mean heart rate variation appears to become smaller as subjects age. A correlation is best for measuring the relationship between two numerical measures; the correlation between age and heart rate variation is -0.45.
17. A correlation of -0.45 indicates a negative relationship between age and heart rate variation: As subjects get older, their heart rate variation decreases. A correlation of this magnitude indicates a fair degree of relationship, and it indicates that having norms for each age group is a good idea if the number of subjects is sufficient. Unfortunately, the number of subjects in each age group is not large enough to permit the calculation of age-adjusted norms.
18.
a. Approximately 25 lb, or 11.5 kg.
b. Approximately 18.9 in., or 48 cm.
c. Approximately 16 lb, or 7 kg.
19. The coefficient of variation for men is 10.75/ 7.27, or 147.9%; for women it is 7.22/6.11, or 118.2%. Therefore, in this small sample, men had more variability in their red blood cell counts than did women.
a. The risk ratio using person-years of observation is
The risk ratio for person-years is similar in size to the value based on subjects because both the aspirin and control groups were observed for similar periods of time. They differ if one group is followed for a substantially longer period than the other.
b. This statistical adjustment (described in Chapters 8 and 10) controls for differences in age and use of beta-carotene in the aspirin and control groups by calculating the value of the relative risk that would occur if no differences existed.
c. It is possible to speculate that the subjects in this study, all physicians who agreed to participate in the study, are more health-conscious than the general public. If so, the effects of aspirin and beta-carotene, when added to the already healthy life-style factors, might have a smaller incremental effect than in the general population.
20. The odds that a person with a stroke abuses drugs are
The odds that a person without a stroke abuses drugs are
Therefore, the odds ratio is 0.518/0.092 = 5.64; that is, a person in this study who abuses drugs is more than five times more likely to have a stroke.
21.
a. The purpose of the study was to learn about acute effects of acetylsalicylic acid (ASA) on the gastroduodenal mucosa in young and old healthy men.
b. The study design was an experiment because a manipulation was made, and each subject was used as his own control; therefore, it is a self-controlled clinical trial.
c. Two groups were used because part of the purpose was to determine if differences existed between young and old men.
d. Figure 1 in Moore and colleagues (1991) indicates that a dose-response relationship exists between the total number of lesions observed on endoscopic study: The smallest number was observed under the placebo condition, and the largest number when subjects received 1300 mg aspirin. The long whiskers and dots representing the extreme values when subjects received 1300 mg aspirin show that considerable variability exists between subjects in the total number of lesions observed.
e. Figure 2 in Moore and colleagues (1991) indicates that significantly more variability occurs between subjects at the higher ranges of pH values than at the lower ranges. The figure legend points out that the median or middle value for three of the plots is at the very bottom of the plot; this means that half of the men had pH values under 1.0.
Box B-1. Stem-and-leaf plot of heart rate variation to deep breathing.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||
Selected Percentiles from Percentile Section of RR_VAR |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Chapter 4
1.
a. To show that gender and blood type are independent, we must show that
for each cell in the table:
b.
This demonstrates that P(male | type O) = P(male) when these are independent.
2. Assuming 47 patients were in the study,
a.
b.
c.
d.
e.
Using the binomial distribution, we get
3.
a.
b.
c.
4. Use the binomial distribution.
a. P(infection) = 0.30
b.
5. λ= 1487/390 = 3.81. The probability of exactly five hospitalizations is
6.
a. The probability that a normal healthy adult has a serum sodium above 147 mEq/L is P (z > 2) = 0.023.
b. P[z < (130 - 141)/3] = P(z < -3.67) < 0.001
c.
d. The top 1% of the standard normal distribution is found at z = 2.326; therefore, 2.326 = (X - 141)/3, or X = 147.98. So a serum sodium level of approximately 148 mEq/L puts a patient in the upper 1% of the distribution.
e. The bottom 10% of the standard normal distribution is found at z = -1.28; therefore, -1.28 = (X - 141)/3, or X = 137.16. So a serum sodium level of approximately 137 mEq/L puts a patient in the lower 10% of the distribution.
7. The distributions are given in Table B-3.
Thus,
Graphs of the preceding distributions illustrate that the binomial distribution is quite skewed when the proportion is 0.1 (as well as when the proportion is close to 1.0, such as 0.9 and 0.8). When the proportion is near 0.5, the distribution is nearly symmetric—perfectly so at 0.5.
8.
a.
and the standard deviation is
(Note that the standard deviation uses 5 in the denominator instead of 4 because we are assuming that these 5 observations make up the entire population.)
b.
Table B-3. Binomial distributions for different values of the parameter, π. |
||||||||||||||||||||||||||||||||
|
c. and the standard deviation of the mean (or the standard error of the mean, SE) is
d.
e. Note that the mean of the means found in part b is the same as the mean found in part a, and the SE is the same as
f.
9.
a. This question refers to individuals and is equivalent to asking what proportion of the area under the curve is greater than (103 - 100)/3 = +1.00 and less than (97 - 100)/3 = -1.00, using the z distribution; the area is 0.317, or 31.7%, from Table A-2.
b. This question concerns means. The standard error with n = 36 is 3/6 = 0.5. The critical ratio for a mean equal to 99 is (99 - 100)/0.5 = -2.00 and for 101 is +2.00. The area below -2.00 and above +2.00 is 0.046; therefore, 4.6% of the means are outside the limits of 99 and 101.
10.
a. The standard deviation is
Then the probability that a patient is intoxicated more than 104 times per year is
From Table A-2, the P(z > 0.60) is 0.274, so we know the probability is slightly more.
b. From Table A-2, the value of z that separates the upper 5% of the distribution from the lower 95% is 1.645. We need to find X so that
Solving for X gives X = (1.645 × 71.0) + 61.6 = 178.4, or approximately 179 times a year.
c. Approximately 5% of medical students graduate with a debt of $200,000, and this is $96,000 above the mean of $104,000. The point in the normal distribution that separates the top 5% is 1.645. Therefore, $96,000 is 1.645 standard deviations; we divide by 1.645 to find that one standard deviation is $58,359.
Chapter 5
1.
a. Wider.
b. Increase the sample size to obtain a narrower confidence interval.
c. Narrower.
2. The P value is listed as 0.000. It is customary, however, to report such a P value as P < 0.001; computer programs typically print out only three decimal places and the probability would be given as greater than 0 were it not for this limitation. The 95% confidence interval does not contain zero, so we know the P value is < 0.05. We can request a 99.9% confidence interval if we want more precision, as illustrated in the lower part of Table B-4, produced using SYSTAT.
3. The two-tailed z value for α of 0.05 is ą1.96, and the lower one-tailed z value related to β = 0.80 is approximately -0.84. With a standard deviation of 3 and a 2-oz difference, the sample size is approximately 18. This is much less than the sample needed to find a 1-oz difference.
Table B-4. Hypothesis test and confidence interval for mean soda consumption among 2-year-old children. |
|||||||||||||
|
4. We found a needed sample size of 250 to detect a difference between 95% and 90%. For 80% power to find a difference between 97% and 90%, a sample of 109 is adequate. A smaller sample size is needed when the difference we want to detect is larger.
5. Our power calculations assumed SD = 3 oz, and Dennison and coworkers (1997) had an SD of 4.77. This illustrates the role of the standard deviation in finding the sample size.
6.
Solving for X:
7. See Table B-5.
They agree by chance that 30/50 = 60% × 35/50 = 70% or 42% were negative and 40% × 30%, or 12%, were positive, for a total of 54% of the mammograms. They actually agreed on 25 + 10 or 35 of 50, or 70%.
Table B-5. Classification of a sample of 50 mammograms by two physicians. |
||||||||||||||||||
|
8.
a. A histogram using NCSS is reproduced in Figure B-3 and indicates that the distribution is fairly normal, so we will proceed with the ttest.
b. Selected output from the SPSS procedure for the paired t test is reproduced in Table B-6. Note that the mean in the one-sample test is the same as the mean difference (except for the sign) in the paired t test, as the t value and the probability level. The two approaches therefore give identical results.
9.
a. A histogram of the change in depression scores in Figure B-4 shows a fairly normal distribution, so it is appropriate to use the paired t test to learn if the change is significantly different from zero.
b. The McNemar statistic indicates no change in the proportion of people who were depressed. See Table B-7.
c. Both the paired t test and the McNemar test lead to the same conclusion.
|
Figure B-3. Change in HDL between baseline and 3 months. [Data, used with permission, from Sauter GH, Moussavian AC, Meyer G, Steitz HO, Parhofer KG, Jungst D: Bowel habits and bile acid malabsorption in the months after cholecystectomy. Am J Gastroenterol 2002;97(2): 1732-1735. Analyzed with NCSS; used with permission.] |
10.
a. The distribution of ounces is skewed to the right (see Figure B-5). A nonparametric procedure is therefore recommended.
b. Selected output from the NCSS One-Sample t Test is reproduced in Table B-8. The observed value of the t test is 9.7483 with a Pvalue of 0.000 (although we know the value should be reported as P < 0.001), and the decision is to reject the null hypothesis that the mean daily juice consumption in 5-year-old children is equal to zero. The probability for the sign test is also given as 0.000; thus, in this example, the two procedures lead to the same conclusion.
c. The box plots in Figure B-6 show that the distribution of juice consumption is slightly less among 5-year-olds. It does not appear to be a lot less, however. Exercise 13 in Chapter 6 asks you to perform the statistical test to compare the two groups of children.
Chapter 6
1. The critical value of t is larger, 2.13 using interpolation, and the pooled standard deviation and standard error must be recalculated. The 95% confidence interval is
This CI is wider than the one in Section 6.2 and now contains zero. We would therefore conclude that no difference exists in pulse oximetry if only 25 patients were in each group.
2.
3.
a. The group with DI < 10 and the group with DI < 10 are two independent groups. A confidence interval for the difference in Barthel Index therefore uses degrees of freedom equal to the sum of the sample sizes minus 2. Table B-9 shows the output from NCSS.
The 95% confidence limits are -0.12 to 25.35. Because the interval contains zero, it is possible that the difference is zero and that the two groups do not differ on BI at discharge.
b. Answering this question requires us to look at the same group (all the patients) twice, so the analytic method is the paired t. From NCSS, the mean difference in BI from admission to discharge is -26.81, and the standard deviation is 12.49.
Table B-6. Equivalence of results from paired t test and one-sample t test. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Figure B-4. Histogram of depression score changes. (Data, used with permission, from Henderson AS, Korten AE, Jacomb PA, Mackinnon AJ, Jorm AF, Christensen H, et al: The course of depression in the elderly: A longitudinal community-based study in Australia. Psychol Med 1997;27:119-129. Output produced using NCSS; used with permission.) |
Table B-7. Equivalence of results from paired t test on the mean difference and the McNemar test for paired proportions of number of depressive symptoms. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
or -30.4745 to -23.14252 using NCSS. Because the interval does not contain zero, it is unlikely that the difference in BI from admission to discharge is zero. In fact, we can be 95% sure it has increased between approximately 23 and 30 points.
4. If sample sizes are equal, n1 = n2, so we can use simply n in the formula for pooled SD. The result is that the pooled SD is the square root of the mean of SD21 and SD22.
|
Figure B-5. Juice consumption in 5-year-olds. (Data, used with permission, from Dennison BA, Rockwell HL, Baker SL: Excess fruit juice consumption by preschool-aged children is associated with short stature and obesity. Pediatrics 1997;99:15-22. Output produced using NCSS; used with permission.) |
5. The null hypothesis is that the operating room times were the same, and the alternative hypothesis is that they were not the same. Thet test can be used for this research question; we use an α of 0.05 so results can be compared with the 95% confidence interval. The degrees of freedom are 181 + 740 - 2, so the critical value is 1.96. The pooled standard deviation, found in Section 6.2, is 4.44. So thet statistic is
The absolute value of t, |-6.49|, is 6.49, greater than the critical value of 1.96, so we reject the null hypothesis and conclude that, on the average, patients who had a PE had a lower pulse oximetry than patients who did not have a PE.
6.
a. Table B-10 shows the output from NCSS. Because the chi-square value is not significant, we conclude that no association exists.
b.
Table B-8. Equivalence of results from t test and sign test for juice consumption in 5-year-old children. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Figure B-6. Juice consumption by 2- and 5-year-olds. (Data, used with permission, from Dennison BA, Rockwell HL, Baker SL: Excess fruit juice consumption by preschool-aged children is associated with short stature and obesity. Pediatrics 1997;99:15-22. Output produced using NCSS; used with permission.) |
c.
d.
e. An odds ratio of 1.29 indicates that women who experience either pain or cramping are 29% more likely not to have had a paracervical block. Based on the chi-square test, we expect that this value of the odds ratio is not significant. Chapter 8 presents confidence intervals for the odds ratio.
7. Let N be the total number of observations, A the number of observations in a given row, and K the number of observations in a given column. For example, for a table with two rows and three columns, we have Table B-11, in which we want to find the expected value for the cell with the asterisk (*).
The probability that an observation occurs in row A, P(A), is A/N, and the probability that an observation occurs in column K, P(K), is K/N. The null hypothesis being tested by chi-square is that the events represented by the rows and columns are independent. Using the multiplication rule for independent events, the probability that an observation occurs in row A and column K is P(A)P(K) = A/N × K/N = AK/N2. Multiplying this probability by the total number of observations N gives the number of observations that occur in both row A and column K; that is, AK/N2 × N = AKN/N2 = AK/N or the row total × the column total ÷ the grand total.
8. The rule of thumb for α equal to 0.05 and power equal to 0.80 has zα = 1.96 and zβ = -1.28 for all computations; that is, only σ and (ľ0- ľ1) change. Therefore,
9. Output from PASS (Table B-12) indicates that 35 patients are needed in each group for 80% power.
10. The 95% confidence interval from Section 6.2 was -2.4 ą (1.96)(0.37) = 2.4 ą 0.73 or 1.67 to 3.13. To have 90% and 99% confidence intervals, only the value for t with 919 degrees of freedom needs to be changed.
Table B-9. Confidence limits for the difference in mean Barthel index. |
||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||
Table B-10. Results of chi-square test for association between the occurrence of pain and cramping and having a paracervical block. |
||||||||||||||||||||||||||||||||||||||||
|
The 90% CI is 2.4 ą (1.645) (0.37) = 2.4 ą 0.61 or 1.79 to 3.01.
The 99% CI is 2.4 ą (2.576) (0.37) = 2.4 ą 0.95 or 1.45 to 3.35.
Lower confidence gives a narrower interval, and higher confidence requires a wider interval.
11. Chances are good that more variation exists in the number of procedures done in the midsized centers. The t test requires that the variances (or standard deviations) not be different in the two groups. Because the sample sizes are quite different, 60 compared with 25, it is possible that violating the assumption of equal variances resulted in a nonsignificant t test.
12. The t test or confidence limits for the difference in two means may be used.
a. First, find the pooled standard deviation:
Second, use the pooled SD to find the standard error of the difference in two means:
Table B-11. Contingency table with two rows and three columns. |
||||||||||||||
|
||||||||||||||
Table B-12. Power analysis using NCSS. |
||||||||||||||
|
Third, find the critical value from Table A-3, which for α = 0.05, is approximately 1.99.
Fourth, perform the t test:
and, since |-2.11| > 1.99, reject the null hypothesis of no difference and conclude that there were significant differences in the operating room time.
Alternatively, form a confidence interval: -19.15 ą (1.99)(9.07) = -37.19 to -1.11 (using NCSS for the calculations; if you do them by hand, you may obtain slightly different numbers). Because zero is not within the interval, we again conclude there is a significant difference.
b. Placing the largest variance in the numerator for the F test, we have 46.842/38.322 with 40 - 1 and 48 - 1 degrees of freedom. From Table A-4, the critical value at 0.05 is approximately 1.66 with interpolation. The observed value of the F ratio is 1.49, and because it is less than the critical value, we conclude that the variances are not different and it is appropriate to use the t test.
13. Box plots are given in Figure B-7. The group not having a PE has one person with an extraordinarily low pH. This value contributes greatly to the skew of the distribution. The F test may have been significant because it is so sensitive to departures from normality. In this situation we would accept the findings of the Levene test. We also suggest that the value be double-checked to see if an error in recording was made.
14. Selected output from the NCSS program for the two-group t test is given in Table B-13. Because the 95% confidence limits include zero (-0.410 to 2.417), we conclude that no difference exists in juice consumption between 2- and 5-year-olds.
Chapter 7
1. The ANOVA results are given in Table B-14. The observed value for the F ratio is 3.44 (shaded line in the ANOVA table), with 2 and 70 degrees of freedom. The critical value from Table A-4 with 2 and 70 degrees of freedom (the closest value) is approximately
3.14 at P = 0.05; therefore, there is sufficient evidence to conclude that the mean free T3 levels are not the same for all three groups of patients.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Figure B-7. Box plots of pH for patients with and without a pulmonary embolism. (Data, used with permission of the authors and publisher, Kline JA, Nelson RD, Jackson RE, Courtney DM. Criteria for the safe use of D-dimer testing in emergency department patients with suspected pulmonary embolism: A multicenter US study. Ann Emergency Med 2002;39:144-1524. Plot produced with NCSS; used with permission.) |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Table B-13. t Test comparing mean daily juice consumption by 2- and 5-year-old children. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
2. Refer to Table B-14; both the Tukey and Scheffé procedures indicate that groups 2 and 3 (or A and C) differ from each other (see shaded lines). However, group B is not different from either groups A or C.
3. Two comparisons are independent if they use nonoverlapping information.
a. Independent because each comparison uses different data.
b. Dependent because data on physicians are used in each comparison.
c. Independent because each comparison uses different data.
d. None of these three comparisons are independent from the other two because they all use data on medical students.
4.
a. This is a one-way, or one-factor, ANOVA because only one nonerror term occurs, the among-groups term.
b. Total variation is 2000.
c. There were four groups of patients because the degrees of freedom are 3. There were 40 patients because n - 4 = 36.
d. The F ratio is (800/3)/33.3 = 8.01.
e. The critical value with 3 and 30 degrees of freedom is 4.51; with 3 and 40 degrees of freedom is 4.31; interpolating for 3 and 36 degrees of freedom gives 4.39.
f. Reject the null hypothesis of no difference and conclude that mean blood pressure differs in groups that consume different amounts of alcohol.
A post hoc comparison is necessary to determine specifically which of the four groups differ.
Table B-14. Answer to exercises 2 and 3 in Chapter 7. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
5.
a. The assumption of sphericity is rejected for both the Barthel index and the Barthel*Hypertension interaction, meaning that the variances were not equal over time. A correction to the degrees of freedom must therefore be made.
b. SPSS gives the Greenhouse-Geisser and the Huynh-Feldt corrections. A lower bound correction is also available, which is the most conservative conclusion possible. In our example, all three indicate that a significant difference exists in the Barthel index measures over time.
c. The interaction is significant, and we conclude that pattern of changes in the Barthel index over the four periods is not the same for patients with hypertension as for those without hypertension. If you calculate the means for each period, the values for patients with hypertension increase more rapidly over time and continue to increase after the 3-month measurement, but the values for patients without hypertension increase less and are level after the 3-month measurement.
d. The results are not significant at the traditional 0.05 level (P value = 0.114), indicating that the Barthel index “collapsed over the periods” was not different for patients with versus those without hypertension. Because there was a significant interaction, however, we would not draw any conclusions about this factor.
6.
a. The P value for the interaction between thyroid status and weight is 0.728; therefore, it makes sense to continue to examine the main effects of thyroid and weight.
b. The P value for thyroid status is 0.54 and for weight is 0.50.
c. The conclusion is: The observed differences in glucose level were not large enough to conclude that hyperthyroid patients differed from controls, that overweight patients differed from normal weight patients, or that an interaction occurred between thyroid and weight status. We can therefore conclude that the four groups had similar mean levels of glucose.
7.
a. No, the P value is 0.545.
b. Yes, the P value is 8.49 × 10-6, or 0.00000849.
c. Yes, the interaction is also significant, with P = 0.00000580.
d. A significant interaction means that the effect of one of the factors, such as type of psoriasis, depends on the level of the other factor, age at onset. Therefore, it is not possible to draw any conclusions about the factors independently; we cannot say simply that %TBSA differs by familial vs sporadic type; we must note that the differences depend on age at onset.
Chapter 8
1.
a.
The correlation of 0.42 indicates a fair degree of relationship between daily stool lipid content and energy.
b. Figure 8-18 was presented to show the relationship between fecal lipid level and fecal energy in the cystic fibrosis patients and the difference between the relationship in these patients and the relationship in control children. The correlation in control patients appears to be stronger than in the cystic fibrosis patients and to have a more positive slope.
2.
a.
where 132 and 133 are found by subtracting 260 and 244 from the total number of infants at risk in the TRH and placebo groups, respectively.
With 1 degree of freedom, this value is much smaller than the 3.841 associated with P = 0.05. We conclude that the evidence is insufficient to conclude that a significant relationship exists between TRH and the development of respiratory distress.
b. The relative risk of death among infants not at risk is
The 95% CI is
Because the confidence interval contains 1, the odds ratio could be 1; therefore, the evidence is insufficient to conclude that a significant risk of death exists in infants who were given TRH.
3.
4.
a. Both the ratio of OKT4 to OKT8 cells and the lifetime concentrate use have skewed distributions, and the logarithmic transformation makes them more closely resemble the normal distribution.
b. r = - 0.453 indicates a fair to moderate inverse (or negative) relationship; r2 indicates that approximately 21% of the variation in one measure is accounted for by knowing the other.
c. The 95% confidence bands are specified as being related to single observations; therefore, 95% of predicted log (OKT4/OKT8) falls within these lines. Note also the appropriate slight curve of the bands.
5.
a. Case–control.
b. The odds ratio is (20 × 1157)/(41 × 121) = 4.66; 95% confidence limits are the antilogarithms of
c. The age-adjusted odds ratio is an estimate of the value of the odds ratio if the cases and controls had identical age distributions. Because the age adjustment increases the odds ratio from 4.66 to 8.1, the cases represented a younger group of women than the controls, and the age adjustment compensates for this difference.
Thus, controlling for age, we have 95% confidence that the interval from 3.7 to 18 contains the true increase in risk of deep vein thrombosis (pulmonary embolism) with the use of oral contraceptives.
6.
a.
Transforming the values 0.380 and 0.946 back to correlations gives r = 0.36 and r = 0.74. The 95% confidence interval for the observed correlation of 0.58 is therefore 0.36 to 0.74, and we are 95% confident that the true value of the correlation in the population is contained within this interval.
b. We used the nQuery Advisor program to illustrate the sample size needed in this situation; the output is given in Figure B-8. A sample of 143 women would be necessary.
|
Figure B-8. Illustration of sample size program nQuery Advisor. (Data, used with permission, from Hébert R, Brayne C, Spiegelhalter D: Incidence of functional decline and improvement in a community-dwelling very elderly population. Am J Epidemiol 1997;145:935-944. Figure produced using nQuery Advisor; used with permission.) |
7.
a. Bile acid synthesis, because the absolute value of the correlation is the highest.
b. Bile acid synthesis, because it has the highest correlation.
c. The relationships between age and bile acid synthesis look as if they are relatively similar for men and women; however, the relationship between cholesterol secretion and age appears to be more positive in women than in men, and the relationship between age and pool-size cholic acid appears to be more negative in women than men (because the slopes are steeper in women).
8. The residuals are given in Figure B-9. The residuals most resemble Figure 8-13B and indicate a linear relationship.
9. The regression line goes through the point (X̅, [Y with bar above]); therefore, the point at which X intersects the regression line, when projected onto the Y axis, is [Y with bar above]′ = [Y with bar above].
|
Figure B-9. A plot of residuals. (Data, used with permission of the author and the publisher, from Gonzalo MA, Grant C, Moreno I, Garcia FJ, Suarez AI, Herrera-Pombo JL, et al: Glucose tolerance, insulin secretion, insulin sensitivity and glucose effectiveness in normal and overweight hyperthyroid women. Clin Endocrinol 1996;45:689-697. Output produced using NCSS; used with permission.) |
10. A positive correlation indicates that the values of X and Y vary together; that is, large values of X are associated with large values of Y, and small values of X are associated with small values of Y. A positive slope of the regression line indicates that each time X increases by 1, Y increases by the amount of the slope, thereby pairing small values of X with small values of Y; a similar statement holds for large values.
11.
a. The questions addressed by the study were: Is the pathogenesis of steroid-responsive nephritis syndrome (SRNS) immune-complex-mediated? Does the clinical activity of the disease relate to the presence of circulatory immune complexes?
b. Preexisting groups were used, patients with SRNS and patients with systemic lupus erythematosus (SLE), so the study was not randomized. No treatment was administered, so it was not a clinical trial. The observations on the variables of interest, IgG-containing complexes and C1q binding, were obtained at the same time, and the study question focused on “What is happening?” This study is therefore best described as cross-sectional.
c. Patients with and without evidence of active disease were studied. Patients with SLE were also studied because immune complexes are known to have a pathogenetic role in this disease.
d. According to Figure 8-21 the correlation between C1q-binding and IgG complexes for SLE patients is significant, r = 0.91, but this result in and of itself does not establish a cause-and-effect relationship. The correlation is not significant for patients with SRNS, but the sample size is relatively small, indicating low power to detect a significant relationship.
e. Authors state that the lines are 95% confidence limits for the patients with lupus. The lines are parallel, however, instead of curved, as they should be. They probably relate to individuals, because (1) of the way they are described in the legend, and (2) limits for the mean would probably be closer to the regression line.
f. It's not possible to tell for sure; however, it looks as if they might. First, the correlation for SRNS patients is not significantly different from 0 and the correlation for SLE patients is 0.91; if correlations are different, so are the regression lines. The sample size of SRNS patients is very small, however, and may keep us from detecting a statistically significant difference.
g. No. The study population must be carefully defined to reduce the likelihood that patients with other disease processes are not included.
Chapter 9
1.
a. See Table B-15 for the arrangements of the observations according to the length of time patients were in the study and survival probabilities.
b. The survival curve, produced with NCSS, is given in Figure B-10.
c. Information for the logrank statistic is as shown in Table B-16.
2.
a. There appears to be no difference in the lengths of time until disease progression in the vitamin C and placebo groups. In fact, the curve for vitamin C is lower than the curve for placebo at all points, indicating shorter times prior to disease progression in the vitamin C group.
b. Median time to disease progression was approximately 3 months in the vitamin C group and 4.5 months in the placebo group.
c. As you probably suspect, the authors found no significant differences in survival between patients receiving the vitamin C and those receiving placebo.
3.
a. The survival curves are given in Figure B-11. It appears that survival rates for the two treatment methods were similar. After 2 years, survival was slightly better in the group on traditional hemodialysis, but a statistical test is needed to learn if this slight difference is significant or could occur by chance.
Table B-15. Kaplan-Meier tables for patients who had kidney transplantation. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
b. The logrank statistic is 0.92. Because this is a chi-square statistic with 1 degree of freedom, we know that the value does not reach statistical significance. We therefore conclude that these observations do not provide sufficient evidence for a difference in survival in the two treatment groups.
c. This study was not randomized. We therefore do not know the basis for choosing continuous ambulatory peritoneal dialysis or traditional hemodialysis. As a result, no conclusions should be drawn about any differences in treatment.
4. It is possible to estimate median survival for two of the groups. The patients with tumor stage T2b-c have a median survival of approximately 50 months. Those with a tumor stage of T3 or T4 have a median survival of a little less than 40 months.
5.
a. The Kaplan-Meier curves are given in Figure B-12.
b. The curves are widely separated, showing a large difference for those classified at high versus normal risk. The classification variable appears to be a valid indicator of risk.
6. The Mantel-Haenszel is an excellent procedure to compare two distributions. Note that it is possible to estimate the Mantel-Haenszel statistic from the study's Figure 2.
|
Figure B-10. Kaplan-Meier survival curves for patients who had kidney transplantations. (Data courtesy of Dr. Alan Birtch; used with permission. Figure produced with NCSS; used with permission.) |
7.
a. Survival was best for patients whose disease was restricted to a single vessel
b. The highest mortality rate was observed among patients with left main vessel or triple-vessel disease.
c. The median survival for group with single-vessel disease was approximately 18 years; for the other two groups it was approximately 12 years.
Chapter 10
1. For smokers, the adjusted mean is the mean in smokers, 3.33, minus the product of the regression coefficient, 0.0113, and the difference between the occlusion score in smokers and the occlusion score in the entire sample (estimated from Figure 10-3); that is,
Table B-16. Information on the logrank statistic. |
|||||||||||||||
|
|||||||||||||||
|
|||||||||||||||
Figure B-11. Kaplan-Meier survival curve comparing continuous ambulatory peritoneal dialysis to hemodialysis. (Data, used with permission, from Bajwa K, Szabo E, Kjellstrand CM: A prospective study of risk factors and decision making in discontinuation of dialysis. Arch Intern Med 1996; 156:2571-2577. Figure produced with NCSS; used with permission.) |
|
Figure B-12. Kaplan-Meier survival curve for recurrence of stones in patients at high versus normal risk. (Data, used with permission, from Borghi L, Schianchi T, Meschi T, Guerra A, Allegri F, Maggiore U, et al: Comparison of two diets for the prevention of recurrent stones in idiopathic hypercalciuria. N Engl J Med 2002;346:77-88. Survival plot produced with NCSS, used with permission.) |
Similarly, the adjusted mean in nonsmokers is
These are the same values we found (within round-off error) in Section 10.3.
2.
a. A scatterplot of the prediction of PSV using the carotid method is given in Figure B-13. The relationship appears to be curvilinear, mainly due to the three observations with zero CSI.
b. The correlation of PSV is 0.61 with the NASCET method and 0.67 with the CSU method.
c. The regression equation is Y = 74.2858 + 0.3837 X + 0.0477 X2.
d. A stenosis of 60% predicts a PSV of:
3. Predicted value for a 27-year-old Caucasian man who comes to the emergency department on Saturday night with BAC < 50 mg/dL:
|
Figure B-13. Curvilinear relationship between carotid stenosis index (CSI) and peak systolic velocity. (Reproduced, with permission, from Alexandrov AV, Brodie DS, McLean A, Hamilton P, Murphy J, Burns PN: Correlation of peak systolic velocity and angiographic measurement of carotid stenosis revisited. Stroke 1997;28:339-342. Figure produced with NCSS; used with permission.) |
and
Therefore, there is more than a 50-50 chance that this man has an elevated blood alcohol level.
4. The chance agreement that a male is not intoxicated (rounding to whole-number percentages) is 0.82 × 0.74 = 0.61; the chance agreement for intoxication is 0.18 × 0.26 = 0.05. Thus the agreement beyond chance is 0.77 - (0.61 + 0.05) = 0.11. To find kappa, divide by 1 minus the chance agreement (1 - 0.66 = 0.34) to obtain 0.11/0.34 = 0.32, or 32%. Based on the guidelines suggested by Sackett and colleagues (1991), a kappa between 0.21 and 0.40 indicates only slight agreement.
5. If the investigators want to distinguish among three groups of runners, using the numerical anthropometric measures, discriminant analysis should be used. Multiple regression can be used, however, if the actual running time of each runner was used instead of dividing the runners into three groups; in this situation, the outcome measure is numerical.
6. Given the patient is a woman and in the FFS payplan, the regression equation is
which gives 1.68 predicted bed-days during a 30-day period.
7.
a. The regression coefficient is 0.267, indicating a positive relationship.
b. The negative regression coefficient indicates that older ages are associated with lower depression scores, indicating less depression. It is significant because the P value is 0.538. One possible interpretation is that people tend to be more able to deal with their physical conditions and circumstances that occur with increasing age, but this is purely speculative.
c. The block on physical health increases R2 by 0.174 or approximately 17%.
8.
a. Yes, with a reported P value of 0.0000.
b. TUMSTAGE(2) and TUMSTAGE(3), the T classification for T2b-c and T3-4, and pretreatment PSA are both statistically significant.
c. exp (1.4588) = 4.3008. The 95% confidence interval goes from approximately 1.45 to 12.73; therefore, we can be 95% confident that the true odds ratio in the population falls within this range. This interval does not contain 1, so the odds ratio is statistically significant (consistent with the P value).
d. The pretreatment PSA, which was not significant when posttreatment PSA was included, is now significant. In addition, the tclassification T2b-c is significant.
9.
a. Your regression results should resemble those we produced in Table 10-17.
b. Father's height was the variable included in model 1. Stepwise regression begins with the independent variable that has the highest correlation with the outcome, so father's height had the highest correlation with the child's final height.
c. Mother's height, height for chronologic age, and dose are in the final model. Based on the standardized coefficients, the variable making the largest contribution is height for chronologic age.
d. After the other variables entered the equation in model 4, the father's height was no longer significant. This can occur when the other variables are predicting the same portion of the outcome that father's height was predicting, once the values of all other variables are held constant.
e.
This child's final height is therefore predicted to be -1.52, compared with the child's actual height of -2.18.
Chapter 11
1.
a. A nonrepresentative group; those that do not like the airline probably are not on the plane; the flight is not over.
b. Response bias; only those that felt strongly in one direction or the other tend to respond.
c. Typically, patient satisfaction questionnaires are very general in nature and provide little detail on real issues; the response rate is very low, often less than 10%. It may be better to concentrate resources on questionnaires on specific topics sent to random samples of patients with adequate follow-up to produce a reliable response rate.
d. Students may fear that their response will in some way be identified and may hesitate to express their true feelings.
2. Depending on the question you want to answer, you could form confidence intervals for the means or examine correlations among responses to the different variables. If you want to know if the variables cluster together, you could do factor analysis, discussed inChapter 10.
3. Rogers and colleagues stated that they used a P value of 0.01 due to the large number of variables they analyzed. Another option for adjusting the P value for a large number of tests is to divide the P value by the number of comparisons; in this situation, Rogers and colleagues made nine comparisons, so they would use 0.05/9 = 0.0045, or approximately 0.05.
4. Recall that stepwise regression enters the independent variable with the highest correlation with the dependent variable (percent of patients counseled), after the relationships between the independent variables in the equation and the dependent variable have been controlled. In this example, chances are that the high-priority item is highly correlated with the first two variables already in the equation. Therefore, it would have little to contribute to the equation that is not already accounted for. To check this out, run a regression among percent and the three independent variables.
5. It might be easier for parents to estimate the number of hours their child watches TV in an average week rather than a month. Alternatively, the pediatrician could list a set of choices, such as 0-5, 6-10, 10-15, and so on.
6. Both sides of the issue should be represented in a question to keep from leading the respondent. It would be better to ask: “Do you agree or disagree that the new clinic hours are an improvement over the old ones?” Even better would be: “What is your opinion of the new clinic hours compared with the old clinic hours?” and provide five responses such as: much better, somewhat better, no difference, somewhat worse, much worse.
7. Either d or e, depending on the amount of resources you have to gather the membership list.
8.
a. Younger patients with a history of UTI are more likely to receive this prescription.
b. All other regions are compared with South. For instance, a physician from the Midwest is 1.42 times more likely to prescribe recommended fluoroquinolones than a physician from the South. However, the confidence interval contains 1, so it is not statistically significant. It is possible to compare the other three parts of the country only to the South and not to one another.
c. There are several significant associations. Internists are more likely than family physicians to prescribe recommended fluoroquinolones and less likzely to prescribe nitrofurantoin or nonrecommended antibiotics; OB-GYN physicians are less likely than family physicians to prescribe trimethoprim-sulfamethoxazole or fluoroquinolones and more likely to prescribe nitrofurantoin or nonrecommended antibiotics?
Chapter 12
1.
a. With a baseline of 2% and 95% sensitivity, 0.95 × 20 = 19 true-positives; with 50% sensitivity, the false-positive rate is 50% and 0.50 × 980 = 490 false-positives occur. The probability of lupus with a positive test is TP/(TP + FP) = 19/509 = 3.7%.
b. With a baseline of 20%, 190 true-positives occur. Similarly, with 50% specificity, 400 false-positives occur. The chances of lupus with this index of suspicion is therefore 190/590 = 32.2%.
2. Using Bayes' theorem with D = lupus and T = test, we have
Using the likelihood ratio method requires us to redefine the pretest odds as the odds of no disease, that is, 0.98/(1 - 0.98) = 49. The likelihood ratio for a negative test is the specificity divided by the false-negative rate (ie, the likelihood of a negative test for persons without the disease versus persons with the disease); therefore, the likelihood ratio is 0.50/0.05 = 10. Multiplying, we get 49 × 10 = 490, the posttest odds. Reconverting to the posttest probability, or the predictive value of a negative test, gives 490/(1 + 490) = 0.998, the same result as with Bayes' theorem.
3.
a. Increasing the threshold for a positive ESR from < 20 mm/hr to < 0 mm/h will decrease the sensitivity.
b. There will be more false-positives because the specificity will increase.
4.
a. Positive results occurred 138 times in 150 known diabetics = 138/150 = 92% sensitivity.
b. 150 - 24 = 126 negative results in 150 persons without diabetes gives 126/150 = 84% specificity.
c. The false-positive rate is 24/150, or 100% -specificity = 16%.
d. 80% sensitivity in 150 persons with diabetes gives 120 true-positives. 4% false-positives in 150 persons without diabetes is 6 persons. The chances of diabetes with a positive fasting blood sugar is thus 120/126 = 95.2%.
e. 80% sensitivity in 90 (out of 100) patients with diabetes = 72 true-positives; 4% false-positive rate in 10 patients without diabetes = 0.4 false-positive. Therefore, 72/72.4 = 0.9945, or 99.45%, of patients like this man who have a positive fasting blood sugar actually have diabetes.
5.
a. Using Bayes' theorem with prior probability of 0.30, we have
that is, an 88.5% chance.
b.
or a 95.7% chance.
6.
a. There are 80 × 0.19 = 15.2 true-positives, and 80 - 15.2 = 64.8 false-negatives; 20 × 0.82 = 16.4 true-negatives and 20 - 16.4 = 3.6 false-positives. Therefore, the probability of an MI with a positive ECG is 15.2/(15.2 + 3.6) = 80.9%.
b. The probability of an MI even if the test is negative is 64.8/(64.8 + 16.4) = 79.8%.
c. These calculations illustrate the uselessness of this criterion (ST elevation < 5 mm in discordant leads) in diagnosing MI.
d. The likelihood ratio is TP/FP or 19/18 = 1.06.
|
Figure B-14. Decision tree for aneurysms with probabilities and utilities included. Abbreviation: EU = expected utility. (Adapted and reproduced, with permission, from van Crevel H, Habbema JDF, Braakman R: Decision analysis of the management of incidental intracranial saccular aneurysms. Neurology 1986;36:1335-1339.) |
e. The pretest odds are 80/20, or 4 to 1. The posttest odds are 4 × 1.06 = 4.24. As you can see, the test does very little to change the index of suspicion.
7.
a. The best test to rule in is the one with the highest specificity to minimize the number of false-positives: ankle dorsiflexion weakness.
b. The best test to rule out is the one with the highest sensitivity to minimize the number of false-negatives: ipsilateral straight-leg raising.
8. See Figure B-14 for the expected utilities for the decision tree.
9.
a. 0.20, seen on the top branch.
b. No, this is not obvious. Although the author provides the proportion of patients who have a positive examination, 0.26 (and negative examination, 0.74), these numbers include false-positives (and false-negatives) as well as true-positives (and true-negatives). The author also gives the predictive value of a positive test, 0.78 (and of a negative test, 0.997); and by using quite a bit of algebra, we can work backward to obtain the estimates of 0.989 for sensitivity and 0.928 for specificity. Readers of the article should not be expected to do these manipulations, however; authors should give precise values used in any analysis.
c. For no test or therapy: (0.20)(48) + (0.80)(100) = 89.6; for colectomy: (0.03)(0) + (0.97) [(0.20)(78) + (0.80)(100)] = 92.7. Therefore, colonoscopy is the arm with the highest utility, at 94.6.
Chapter 13
1. C
2. A
3. D
4. D
5. D
6. B
7. E
8. A
9. C
10. E
11. C
12. A
13. A
14. B
15. E
16. B
17. E
18. C
19. C
20. A
21. E
22. B
23. C
24. A
25. C
26. D
27. D
28. C
29. A
30. D
31. B
32. C
33. A
34. E
35. D
36. A
37. D
38. B
39. E
40. D
41. D
42. C
43. G
44. A
45. C
46. A
47. B
48. B
49. A
50. B
51. C
52. C
53. B
54. D
55. E
56. A
57. E
58. B
59. C
60. B
61. B
62. E
63. A
64. A
65. B
66. H
67. B
68. I
69. E
70. G
71. T
72. F
73. T
74. F
75. F