In this section, we explain the FIRTH option in SAS PROC LOGISTIC procedure which used to solve the numerical problems such as non-convergence in estimating regression coefficients, unreasonably large standard errors, and wide ranges of confidence intervals when there are some zero cells in contingency tables, which appeared in the article titled, "Results of an inpatient smoking cessation program: 3-month cessation rate and predictors of success," published in March 2015 by Kim et al.
1)
COMPLETE SEPARATIONS
The complete separation occurs when all subjects are completely classified into each response group according to the values of the explanatory variable(s). Consider the following data set (
Table 1).
2)
The estimated odds ratio (OR) using X=0 as the reference is infinite [OR=(a/b)/(c/d)]=(5/0)/(0/5)=25/0]. Note that if either (b) or (c) is equal to zero, then the OR is undefined. We also obtain an infinite value of standard error for
ORSEOR=OR1a+1b+1c+1d=25/0×15+10+10+15=infinite×infinite. Also note that if any one of the four cell frequencies is equal to zero, then SE (OR) is undefined. Gart and Zweifel
3) suggested improved estimates of OR and SE (OR) which are calculated after adding 0.5 to each cell. According to their suggestion, we obtain OR'=(5.5/0.5)/(0.5/5.5)=121 and SE (OR')=252.76.
SAS PROC FREQ PROCEDURE
We can obtain the OR and its 95% confidence limits for
Table 1 using the following commands in PROC FREQ procedure (here, we do not consider the small sample size problem).
DATA CS; INPUT X Y Z; CARDS;
1 1 5
1 0 0
0 1 0
0 0 5
;
PROC FREQ DATA=CS; TABLES X*Y/ALL; WEIGHT Z; RUN;
<OUTPUT>
Odds ratio = 121.000, 95% confidence limits = 2.0169 ~
7259.1773
Other options, RELRISK or MEASURES in TABLES statement cannot calculate OR and SE (OR) in this case.
SAS PROC LOGISTIC PROCEDURE
We can also obtain the OR and its 95% confidence limits for
Table 1 using the following commands in PROC LOGISTIC procedure.
PROC LOGISTIC DESCENDING DATA=CS; MODEL Y=X/
FIRTH; WEIGHT Z; RUN;
<OUTPUT>
Odds ratio = 120.976, 95% confidence limits = 1.364 ~ >999.999
We can see that both procedures give the same ORs, but somewhat different 95% confidence limits. However if we do not use the FIRTH option in MODEL statement, then values of OR and upper 95% confidence limit are infinites.
Table 1
A contingency table with a complete separation