Logistic is fine, but in SPSS make sure you treat any independent variables that have more than 2 categories as "Categorical", and then for each specify the "cornerpoint" or low risk category as the baseline. The rest of the continuous variables age and binary models can be added to the model.
The easiest approach is to run forward stepwise logistic using e.g. the Wald of Likelihood ratio method of forward stepping. This will only build a model with significant predictors. Stepwise is somewhat biased, however, since you are only selecting the "fish from the of the bucket." Arguable, backwards stepping is a method to preserve subset correlation, resulting in a model that preserves correlation.
Most psychologists or psychometricians would probably use the hierarchical approach with SPSS, where all the family variables are added as a group, all the school environment variables added as a group, peer group, etc, added as a group. This will give a chi-square test statistic (degrees of freedom = number of group variable minus one), that you can use to determine if each group of of variables is a siginificant predictor. Using this appraoch will also reflect that, by design, you have a theoretical idea about constructs and domains that predict risk. In fact, in psychometrics, I probably would never simply throw all variables into a logistic regression equation and see what happens.
The last is to evaluate univariate models, and then add single variables whose beta coefficients are significant (e.g., p<0.25) into a full model. This full model can contain risk factors, adjustments (age, family income, grade point average), and nusaince factors(variable you don't want to study but are significantly different across drinkers and non-drinkers.
Finally, I would recommend identifying variables that are significantly different across drinkers/non-drinkers, which are not really of interest to you, but nevertheless are different across the groups. In disease research, these are commonly the comorbidities that patients have, since patients have multitudes of problems at older ages (depression, electrolyte imbalnce, hypertension, etc.). Once you identify these variables, run a logistic regression (same dependent variable on only these variables). Before run-time, specify in SPSS you want the "logit". The logit is called the "propensity score." Do the run, and at the far rightmost column of the data set you will see "logit_1". Next, in your risk prediction models, use your primary risk and adjustment factors, and the logit to represent all the junk variables (which were different across drinker.non-drinker but not really of primary interest to you -- these are also called confounders). This latter model with the propesnity score representing nuisance factors may be better than models including all the nuisance variables.
Last, for any regression model, I apply my own "DACOD" principle.
D-check the distributions of the independent variables and the model residual.
A-don't forget the important adjsutment variables. AGe is usually in every model since disease or risk prevalence correlates with age.
C-coefficients, check the significance of coefficients and their sign. (+ or -).
O-outliers. Try to identify them and remove them, since they usually strongly bias almost every regression model.
D-Diagnostics, look at residuals, DFFITS, DFBETES, leverage, variance inflation factors (VIFs).
For your question about a beta value of 0.8 for age, it means that for each additional year of age, there is an increase in the odds ratio of exp(0.8).
Get Hosmer & Lemeshow's "Applied Logsitic Regression" textbook (New York, Wiley) if you want to read on logistic. Also, UCLA has a good stats resource for almost every type of regression, which is geared for STATA software, but the principles are the same. |