Instructions for Analyzing Data from CAHPS Surveys in SAS
39
(1) Estimation of case-mix regression coefficients. We may think of this calculation as proceeding in
two stages: first calculating sufficient statistics (the statistics for each entity used in calculating the
coefficients) for regressions within each entity, and then pooling these estimates across the entities,
weighting the sufficient statistics by the corresponding entity weight. The weighting issues in the first
stage concern the weights given individual cases in the sufficient statistics for the within-entity
regressions, and in the second stage concern the weights used when pooling the within-entity estimates
across entities. In general, the within-entity regression estimates will be biased and inconsistent if the
weights are related to residuals from the regression, so it is advisable to use the within-entity weights (if
they are available) unless it is known that the sampling was conducted in a way that does not create bias if
the weights are ignored.
There is more leeway in choice of weights at the entity level when pooling the within-entity estimates.
Weighting each entity’s statistics by the sum of the case weights of cases in an entity yields estimated
coefficients that are representative of the entire population, by weighting the data from each entity by the
total population of the entity. While population representativeness is a common objective for analysis of
surveys, it has some disadvantages in CAHPS surveys because CAHPS results are reported for entities
rather than the population as a whole. If a few entities have much larger populations than others, they
could dominate estimation of the coefficients in a weighted regression; this could be undesirable because
the objective of regression modeling in case-mix adjustment is to estimate a model that fits reasonably
well across all the entities being compared, not just the largest ones or the pooled population.
Furthermore, such disproportionate weighting is generally less efficient statistically than a weighting that
is more uniform, yielding larger variances for the same amount of data. This approach may nonetheless be
desirable if the primary goal of the analysis is to obtain nationally representative estimates, for example,
for national comparison of subgroups of patient that cut across entities, such as those in different regions
or racial/ethnic groups.
Another option is to weight each entity’s data equally; this can be implemented by dividing each case’s
weight by the total weight for the entity. This serves the objectives of CAHPS analyses where the primary
objective is to compare entities or to examine effects of entity-level factors, but may be inefficient if the
sample sizes per entity vary greatly, especially if some entities have very large samples.
A third weighting option weights each entity by its number of respondents (“precision weighting”); this
can be implemented by multiplying each case’s weight by the ratio of number of respondents to total
weight for the entity. Holding other things approximately equal across entities (such as the residual
variance and the within-entity distribution of characteristics), this is statistically the most efficient
method. In this option, entities with small samples do not gain disproportionate weight. While the largest
entity samples do gain more influence in the regression, in many CAHPS applications the sample sizes
are bounded by design (or by limited resources) so a large entity population does not translate into a
proportionately large entity influence in the regression. A possible disadvantage of this method is that it
depends on the sample design and response/nonresponse patterns, and therefore has no clear population
interpretation. Nonetheless, we recommend this as the default option because it is the most robust and
often most statistically efficient method.
The final calculation of case (individual) weights for the case-mix regressions can be understood as
consisting of three steps:
• First, calculate within-entity weights that sum to 1 in each entity; these are equal to the weights
provided to the macro divided by the sum of the weights in each entity.