• KAFM
  • Contact us
  • E-Submission
ABOUT
ARTICLE CATEGORY
BROWSE ARTICLES
AUTHOR INFORMATION

Articles

Commentary

Comments on Statistical Issues in November 2013

Korean Journal of Family Medicine 2013;34(6):434-436.
Published online: November 25, 2013

Department of Biostatistics, The Catholic University of Korea College of Medicine, Seoul, Korea.

Copyright © 2013 The Korean Academy of Family Medicine

This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

  • 3,199 Views
  • 21 Download
prev next
In this section, we explain the actual number of observations used in a multivariate analysis when one or more explanatory variables have missing values, which appeared in the articles titled, "Postmarketing surveillance study of the efficacy and safety of Phentermine in patients with obesity," by Kim et al.1) and "Relationships between dietary habits and allostatic load index in metabolic syndrome patients," by Kim2) published in September 2013.
When there are some missing values in one or more variables, most researchers choose one of the following strategies for analysis: 1) delete all observations which have missing values or 2) use all observations regardless of missing values. The purpose of this section is to show how many observations are actually analyzed in multivariate analyses, such as multiple linear regression analysis or multiple logistic regression analysis, when there are different numbers of missing values in each explanatory variable. Let's perform a multiple linear regression using the following hypothetical data (Table 1).
In this data, explanatory variable x1 has four, x2 has two, and x3 has no missing values, respectively (denoted as a dot), and we will perform three analyzing processes using SPSS, 1) Pearson correlation analysis, 2) multiple linear regression analysis, and 3) stepwise multiple linear regression analysis.
From the menus choose:
Analyze
Correlate
Bivariate...
Select all variables: y, x1, x2, x3
We obtain the following results: (Table 2).
X3 has the highest correlation with y, and explanatory variables, and x1, x2, and x3 are analyzed by using only their valid observations, 6, 8, and 10, respectively.
From the menus choose:
Analyze
Regression
Linear...
Choose dependent variable: y
Independent variables: x1, x2, x3
Options: statistics: descriptive statistics
We obtain the following results: (Tables 3-5).
X3 has the highest correlation with y, but all analyses (descriptive statistics, correlation analysis, and multiple regression analysis) are performed using only six observations which have no missing values for all dependent variables.
(Menus, variable selection, and options are the same as the above)
Variable selection methods: stepwise
We obtain the following results: (Tables 6-8).
Descriptive statistics (the same as above results)
Correlation coefficients (the same as above results)
From the total degrees of freedom (df = 5) in the analysis of variance table, a stepwise multiple regression analysis is performed using only six observations which have no missing values for all explanatory variables, even though the results show that only one variable, x3, which has no missing values, remains in the final model.
As we can see from the above three results, the actual number of observations analyzed in a multivariate analysis is the minimum number of valid observations of all explanatory variables we had intended to include in the analysis, regardless of the variable selection methods.

No potential conflict of interest relevant to this article was reported.

  • 1. Kim HO, Lee JA, Suh HW, Kim YS, Kim BS, Ahn ES, et al. Postmarketing surveillance study of the efficacy and safety of phentermine in patients with obesity. Korean J Fam Med 2013;34:298-306. PMID: 24106582.
  • 2. Kim JY. Relationships between dietary habits and allostatic load index in metabolic syndrome patients. Korean J Fam Med 2013;34:334-346. PMID: 24106586.
Table 1
Hypothetical data
kjfm-34-434-i001.jpg
Table 2
Correlation coefficients
kjfm-34-434-i002.jpg
Table 3
Descriptive statistics
kjfm-34-434-i003.jpg
Table 4
Correlation coefficients
kjfm-34-434-i004.jpg
Table 5
Coefficients
kjfm-34-434-i005.jpg
Table 6
Entered/removed variables
kjfm-34-434-i006.jpg
Table 7
Analysis of variance
kjfm-34-434-i007.jpg
Table 8
Coefficients
kjfm-34-434-i008.jpg

Figure & Data

References

    Citations

    Citations to this article as recorded by  

      Download Citation

      Download a citation file in RIS format that can be imported by all major citation management software, including EndNote, ProCite, RefWorks, and Reference Manager.

      Format:

      Include:

      Comments on Statistical Issues in November 2013
      Korean J Fam Med. 2013;34(6):434-436.   Published online November 25, 2013
      Download Citation
      Download a citation file in RIS format that can be imported by all major citation management software, including EndNote, ProCite, RefWorks, and Reference Manager.

      Format:
      • RIS — For EndNote, ProCite, RefWorks, and most other reference management software
      • BibTeX — For JabRef, BibDesk, and other BibTeX-specific software
      Include:
      • Citation for the content below
      Comments on Statistical Issues in November 2013
      Korean J Fam Med. 2013;34(6):434-436.   Published online November 25, 2013
      Close
      Comments on Statistical Issues in November 2013
      Comments on Statistical Issues in November 2013

      Hypothetical data

      Correlation coefficients

      Descriptive statistics

      Correlation coefficients

      Coefficients

      Entered/removed variables

      Analysis of variance

      Coefficients

      Table 1 Hypothetical data

      Table 2 Correlation coefficients

      Table 3 Descriptive statistics

      Table 4 Correlation coefficients

      Table 5 Coefficients

      Table 6 Entered/removed variables

      Table 7 Analysis of variance

      Table 8 Coefficients

      TOP