Comments on Statistical Issues in May 2015

Article information

Korean J Fam Med. 2015;36(3):154-155
Publication date (electronic) : 2015 May 22
doi : https://doi.org/10.4082/kjfm.2015.36.3.154
Department of Biostatistics, The Catholic University of Korea College of Medicine, Seoul, Korea.

In this section, we explain the FIRTH option in SAS PROC LOGISTIC procedure which used to solve the numerical problems such as non-convergence in estimating regression coefficients, unreasonably large standard errors, and wide ranges of confidence intervals when there are some zero cells in contingency tables, which appeared in the article titled, "Results of an inpatient smoking cessation program: 3-month cessation rate and predictors of success," published in March 2015 by Kim et al.1)

COMPLETE SEPARATIONS

The complete separation occurs when all subjects are completely classified into each response group according to the values of the explanatory variable(s). Consider the following data set (Table 1).2)

Table 1

A contingency table with a complete separation

The estimated odds ratio (OR) using X=0 as the reference is infinite [OR=(a/b)/(c/d)]=(5/0)/(0/5)=25/0]. Note that if either (b) or (c) is equal to zero, then the OR is undefined. We also obtain an infinite value of standard error for ORSEOR=OR1a+1b+1c+1d=25/0×15+10+10+15=infinite×infinite. Also note that if any one of the four cell frequencies is equal to zero, then SE (OR) is undefined. Gart and Zweifel3) suggested improved estimates of OR and SE (OR) which are calculated after adding 0.5 to each cell. According to their suggestion, we obtain OR'=(5.5/0.5)/(0.5/5.5)=121 and SE (OR')=252.76.

SAS PROC FREQ PROCEDURE

We can obtain the OR and its 95% confidence limits for Table 1 using the following commands in PROC FREQ procedure (here, we do not consider the small sample size problem).

DATA CS; INPUT X Y Z; CARDS;
1 1 5
1 0 0
0 1 0
0 0 5
;
PROC FREQ DATA=CS; TABLES X*Y/ALL; WEIGHT Z; RUN;

<OUTPUT>
Odds ratio = 121.000, 95% confidence limits = 2.0169 ~
7259.1773

Other options, RELRISK or MEASURES in TABLES statement cannot calculate OR and SE (OR) in this case.

SAS PROC LOGISTIC PROCEDURE

We can also obtain the OR and its 95% confidence limits for Table 1 using the following commands in PROC LOGISTIC procedure.

PROC LOGISTIC DESCENDING DATA=CS; MODEL Y=X/
FIRTH; WEIGHT Z; RUN;

<OUTPUT>
Odds ratio = 120.976, 95% confidence limits = 1.364 ~ >999.999

We can see that both procedures give the same ORs, but somewhat different 95% confidence limits. However if we do not use the FIRTH option in MODEL statement, then values of OR and upper 95% confidence limit are infinites.

Notes

CONFLICT OF INTEREST: No potential conflict of interest relevant to this article was reported.

References

1. Kim SH, Lee JA, Kim KU, Cho HJ. Results of an inpatient smoking cessation program: 3-month cessation rate and predictors of success. Korean J Fam Med 2015;36:50–59. 25802686.
2. Park YG. Comments on statistical issues in July 2013. Korean J Fam Med 2013;34:293–294. 23904960.
3. Gart JJ, Zweifel JR. On the bias of various estimators of the logit and its variance with application to quantal bioassay. Biometrika 1967;54:181–187. 6049534.

Article information Continued

Table 1

A contingency table with a complete separation

Table 1