Instructions for Analyzing Data
from
CAHPS
®
Surveys in SAS
®
:
Using the CAHPS Analysis Program Version 5.0
AHRQ Contract No.: HHSP233201500026I/HHSP23337004T
Managed and prepared by:
Westat, Rockville, MD
Naomi Yount, Ph.D.
Kayo Walsh, M.S. (Harvard Medical School)
Alan Zaslavsky, Ph.D. (Harvard Medical School)
Edited by Lise Rybowski, MBA
AHRQ Publication No. 20-M019
August 2020
Updated 8/2020
Public Domain Notice. This product is in the public domain and may be used and reprinted without
permission in the United States for noncommercial purposes, unless materials are clearly noted as
copyrighted in the document. No one may reproduce copyrighted materials without the permission of the
copyright holders. Users outside the United States must get permission from AHRQ to reprint or translate
this product. Anyone wanting to reproduce this product for sale must contact AHRQ for permission.
CAHPS
®
is a trademark of AHRQ.
Suggested Citation:
Yount, N., Walsh, K., & Zaslavsky, A. Instructions for Analyzing Data from CAHPS
®
Surveys in SAS
®
:
Using the CAHPS Analysis Program Version 5.0, (Prepared by Westat, Rockville, MD, under Contract
No. HHSP233201500026I). Rockville, MD: Agency for Healthcare Research and Quality; August 2020.
AHRQ Publication No. 20-M019.
The authors of this report are responsible for its content. Statements in the report should not be
construed as endorsement by the Agency for Healthcare Research and Quality or the U.S.
Department of Health and Human Services.
No investigators have any affiliations or financial involvement (e.g., employment, consultancies,
honoraria, stock options, expert testimony, grants or patents received or pending, or royalties) that
conflict with material presented in this report.
Updated 8/2020
Contents
1. Introduction ........................................................................................................................................... 1
2. What Does the CAHPS Analysis Program Do? ................................................................................... 1
3. What Is Included in the CAHPS Analysis Program? ............................................................................ 2
4. Pre-Analysis Decisions ......................................................................................................................... 3
5. Software Requirement and Data File Specifications ............................................................................ 5
6. CAHPS Macro Parameters and Call Statements .................................................................................. 8
7. SAS Data Sets Generated by the CAHPS Analysis Program ............................................................. 17
Appendices
Appendix A. Using the Test Data in the CAHPS Analysis Program .......................................................... 30
Appendix B. Statistical Explanation of Macro Parameters ......................................................................... 38
Appendix C. Summary of Features Included in Each Version of the CAHPS Analysis Program ............. 49
Instructions for Analyzing Data from CAHPS Surveys in SAS
Updated 8/2020 iv
Tables
Table 5.1 Yes/No Variables ................................................................................................................ 6
Table 5.2 Three Response Variables ................................................................................................... 6
Table 5.3 Four-Point Frequency Scale Variables ................................................................................ 7
Table 5.4 Example of Creating Dummy Variables – What is your age? ............................................ 7
Table 6.1 Required Parameters for the CAHPS Macro 5.0 ................................................................. 8
Table 6.2 Optional Parameters for CAHPS Macro 5.0 ....................................................................... 9
Table 6.3 Original Data and Data Used for Analysis ........................................................................ 15
Table 7.1 SAS Data Sets Output for All CAHPS Macro Calls ......................................................... 17
Table 7.2 Additional SAS Data Sets Output from Case-Mix Adjustment ........................................ 17
Table 7.3 Additional SAS Data Sets Output from Saving Case-Mix Adjusted Frequencies ............ 18
Table 7.4 Additional SAS Data Sets Output from Post-Stratification Weighting Option ................ 19
Table 7.5 Additional SAS Data Sets Output from Saving Case-Mix Adjusted Frequencies
for Post-Stratification Weighting Option .......................................................................... 19
Table 7.6 Contents of DP&OUTNAME: Plans Dropped ................................................................. 20
Table 7.7 Contents of LR&OUTNAME: Lists Plans with 100 or Fewer Records ........................... 20
Table 7.8 Contents of N_&OUTNAME: Response Option Percentages .......................................... 21
Table 7.9 Contents of OA&OUTNAME: Results from Test for Significant Differences ................ 22
Table 7.10 Contents of P_&OUTNAME: Percent Missing Data ....................................................... 23
Table 7.11 Contents of SA&OUTNAME: Plan Level Results ........................................................... 23
Table 7.12 Contents of C_&OUTNAME: Case-mix Adjustment Regression Coefficients ............... 25
Table 7.13 Contents of R2&OUTNAME – R-Squared Results for the Case-mix Adjustment
Regression ......................................................................................................................... 25
Table 7.14 List of Macro Results – Y_&OUTNAME ........................................................................ 26
Table 7.15 List of Macro Results – NW&OUTNAME (Similar to N_&OUTNAME but
provides the post-stratification weighted results) .............................................................. 26
Table 7.16 Contents of OW&OUTNAME (Similar to OA&OUTNAME but provides post-
stratification weighted results) .......................................................................................... 27
Table 7.17 Contents of SW&OUTNAME (Similar to SA&OUTNAME but provides post-
stratification weighted results) .......................................................................................... 28
Table A.1 Sample Data for Post-Stratification Weighting ................................................................. 34
Table A.2 Description of test data set variables based on CAHPS Clinician & Group Adult
Survey 3.0 ......................................................................................................................... 35
Table B.1 List of Macro Weight Options .......................................................................................... 41
Table B.2 Case Weighting Used for Case-mix Coefficients ............................................................. 42
Table B.3 Examples of a composite measure with three items using a macro parameter K ............. 45
Table B.4 Examples of composite measure with three items using a macro parameter K and
means ................................................................................................................................. 45
Instructions for Analyzing Data from CAHPS Surveys in SAS
1
1. Introduction
The CAHPS Analysis Program—often referred to as the CAHPS macro—uses SAS
®
software to provide
survey users with a flexible way to analyze CAHPS survey data in order to make valid comparisons of
performance. The program can be applied to any of the CAHPS surveys. This document explains how the
CAHPS Analysis Program works and how to use the program to analyze and interpret survey results.
2. What Does the CAHPS Analysis Program Do?
The CAHPS Analysis Program is designed to analyze CAHPS survey data by doing the following tasks:
Calculating scores. The program calculates scores for all survey measures, including individual
survey items, ratings, and multi-item composite measures. (Learn about composite measures in
the box below).
Adjusting for case mix. The program adjusts the survey data for standard individual case-mix
variables such as respondent age, education, and general and mental health status. This
adjustment makes it more likely that reported differences are due to real differences in
performance rather than differences in the characteristics of enrollees or patients.
Comparing scores. The output from the program also compares the performance of any specific
entity (e.g., health plan, hospital, provider group) included in the data set to the overall
performance of all entities.
For each CAHPS measure, the main output results from this program include:
unadjusted scores (e.g., top box scores)
number of responses used in the analyses
overall mean score
case-mix adjusted scores (if applicable)
case-mix adjuster coefficients (if applicable)
significance rating between case-mix adjusted scores and overall score
adjusted percentages for display in three-bar frequency charts (top box, middle box, bottom box)
What Are Composite Measures?
Composite measures combine CAHPS survey questions that measure the same dimensions of patients’
experiences with health care or health plan services. The use of composite measures simplifies the
interpretation of the data, enhances the reliability of the results (because individual survey items are
often less reliable than combinations of multiple items), and facilitates comparisons of performance
across a unit of analysis (e.g., health plan, medical practice, clinician).
Instructions for Analyzing Data from CAHPS Surveys in SAS
2
3. What Is Included in the CAHPS Analysis Program?
The CAHPS Analysis Program version 5.0 has three core components: a SAS macro program, SAS test
programs (including the formats), and a test SAS data set. You can download a ZIP file with the program
files and data sets from the Agency for Healthcare Research and Quality’s CAHPS web page about
analyzing CAHPS survey data.
The ZIP file contains the following files:
MACRO_CAHPS50.SAS – This is the core SAS macro program that performs the analyses the
user specifies in the SAS test program. The macro file should not be modified.
_1_TEST_FORMAT_CAHPS50.SAS – This program creates formats, which are helpful to
view the data with descriptive words instead of the numeric data values assigned in data (e.g.,
“Always” is shown rather than a “4”).
_2_TEST_PREPDATA_CAHPS50.SAS – This program contains sample code to create
recoded versions of some variables for the macro run (e.g., by creating dichotomous or reverse-
coded variables).
_3A_TEST_CAHPS50.SAS – This short program provides sample code for calling the macro
program with different analysis options and outputs specified.
_3B_TEST_CAHPS50_STRATIFIED.SAS – This program contains sample code for calling
the macro with the post-stratification weighting option.
TEST_CAHPS50_DATA.SAS7BDAT – This sample SAS data set is used with all the test
programs listed above.
TEST_CAHPS50_DATA_recoded.SAS7BDAT – This sample SAS data set is similar to
TEST_CAHPS50_DATA.SAS7BDAT, except the recoded variables created by
_2_TEST_PREPDATA_CAHPS50.sas have already been created for you. This data set is for
users who do not need to use _2_TEST_PREPDATA_CAHPS50.sas.
Appendix A explains how to use the test data. Appendix B provides a statistical explanation of some of
the macro parameters. Appendix C explains all the changes made to the various versions of the CAHPS
Analysis Program over the years.
Instructions for Analyzing Data from CAHPS Surveys in SAS
3
4. Pre-Analysis Decisions
The CAHPS Analysis Program offers the user a number of options for analyzing the survey data. Before
preparing to run the program, analysts should make sure that the project team has agreed upon answers to
the following questions. Their implications for the CAHPS Analysis Program are reviewed below. A list
of all macro parameters is in Section 6.
What is the reporting unit (entity)?
Any analysis of CAHPS data is intended to assess, compare, and report on some type of reporting unit.
Examples of such units include health plans, hospitals, provider groups, clinics, sites of care, and
individual physicians. To avoid confusion, these instructions use the neutral term “entity” to refer to the
unit whose data will be aggregated and analyzed.
Because the CAHPS Analysis Program was initially written for the CAHPS Health Plan Survey, the
reporting unit variable name used in the Analysis Program is “Plan.” This name has no bearing on
the suitability of the program for analyzing data on other types of entities.
Depending on your data collection design, you may be able to use the same data for more than one type of
entity. For example, you could analyze a data set to compare provider groups and then analyze the same
data to assess individual doctors.
Do you need to adjust the results for case mix?
Case mix refers to the distribution of respondents’ health status and sociodemographic characteristics,
such as age or educational level, that may affect survey responses. Without an adjustment, differences
among entities could be due to case-mix differences rather than true differences in quality. (See Case-mix
Adjustment in Appendix B for more information.)
The Analysis Program offers an ADJUSTER macro parameter to include case-mix adjustment as well as
an IMPUTE macro parameter to impute case-mix variables if your case-mix variables have some missing
values.
Will you analyze adult and child surveys together?
The Analysis Program allows users to specify how child and adult surveys will be analyzed. The project
team needs to decide whether to analyze surveys about adults and children separately or together. If you
are analyzing adult and child survey data together, the team must also decide whether to consider
interaction effects. Interaction effects may be an issue when the impact of age or health status on one of
the reporting items depends on whether you are analyzing an adult or child survey. You can adjust for
interaction effects when combining adult and child data by using the ADULTKID macro parameter. (See
Variable CHILD in Section 5.)
What p-value to test statistical significance will you use in the analysis?
A p-value of 0.05 is frequently used to test for statistically significant differences between the entities
being compared. If you choose a different p-value, you can specify it in the Analysis Program using the
PVALUE macro parameter.
Instructions for Analyzing Data from CAHPS Surveys in SAS
4
What, if any, level of substantive (practical) significance will you use to compare
performance?
Substantive significance refers to an absolute difference between the entities being compared that must be
achieved for that difference to be considered meaningful. For example, two health plans may have
statistically significantly differences in average scores based on the selected p-value, but the substantive
difference between the plans’ mean scores may not be large enough to be considered meaningful.
The Analysis Program has two options that allow the user to specify a difference that is substantive. You
can use these options simultaneously or specify only one.
First method. The team decides on a percentage of the distance to the nearest bound that would be a
meaningful difference between entities. You can enter this fraction in the Analysis Program using the
CHANGE macro parameter.
Second method. A much simpler method is to specify an absolute difference that must exist between the
entity’s mean score and the mean score for all entities in the analysis for a difference to be considered
substantive. For this method, you can specify the absolute difference considered to be meaningful using
the MEANDIFF macro parameter.
Do results need to be analyzed using weighting?
In general, weights carry information that helps to make the data more representative of the target
population whose experiences are being assessed. Weighting can arise at three points in the computations
performed by the Analysis Program:
(1) Estimation of case-mix regression coefficients
(2) Calculation of adjusted entity mean scores
(3) Calculation of an overall mean score and significance tests of differences from the overall mean
score.
To set up weights in the Analysis Program, you can use the WGTRESP, WGTMEAN, and WGTPLAN
macro parameters. (See Case Weighting in Appendix B.)
How do you want to weight the items in the composite measure?
A composite measure requires a more elaborate computation to develop the mean score because it
includes more than one item. Users can decide to calculate the composite measure score by selecting one
of the weight options: 1. Weight the items by the number of respondents, 2. Weight the items by the sum
of the respondent weight, or 3. Equal weight (Weight the items equally, calculate as the average of the
number of items). For the equal weight option, users can select an option to adjust the item weight if some
of the items in the composite measure have a low response rate. (See Case Weighting in Appendix B.)
Instructions for Analyzing Data from CAHPS Surveys in SAS
5
5. Software Requirement and Data File Specifications
SAS
®
Software Requirement
The CAHPS Analysis Program was developed using SAS software. Running the program requires Base
SAS software and the SAS/STAT module. Base SAS, which is required to use any SAS product, provides
the print commands, simple plotting capabilities, and procedures for descriptive statistics needed to run
the Analysis Program. The SAS/STAT module adds several statistical procedures such as the SAS
regression procedure, PROC REG or PROC SURVEYREG, to do part of the case-mix calculations.
Data Set Structure
Each row or case in a SAS data set represents data from a unique questionnaire. Appendix A offers
examples of how to meet many of the variable coding and cleaning requirements before using the
Analysis Program.
If data from different CAHPS questionnaires are in the same data set, responses for equivalent questions
are listed under the same variable names, with each row representing data for a unique questionnaire.
Sample Size Requirements
Number of entities (e.g., health plans or providers). The data set must have surveys from at least two
entities. If there is only one entity in the data being analyzed, statistical comparisons cannot be performed
and some parts of the program will not work properly. All the reports will still be produced, but some of
the results will be of limited value.
Number of responses per entity. The Analysis Program requires at least two responses per entity. The
program flags entities with fewer than 100 responses for an individual measure, but performs the analysis
on all entities with at least two records. Including entities with very little data may reduce the precision of
comparisons between individual entities or providers and the overall mean scores.
Variable Naming Requirements
The variable names PLAN, CHILD, VISITS, and SPLIT have specific meaning for the Analysis
Program. If the data set has other variables with these names that do not conform to the specifications
below, the macro may produce errors in the log file and the results may be erroneous. Additionally,
variables starting with RES_ are treated as one of the array statements in the program so variables with
these names may cause errors.
Variable PLAN. The variable PLAN, which refers to the reporting unit or entity, must be included in
your data set. While the variable can be any type of entity, it must be called PLAN for the Analysis
Program to work.
The Analysis Program accepts alphanumeric, character, and numeric formats for this variable in the data
set. Note that this is the only variable that does not have to be coded numerically. The maximum variable
length for PLAN is 40 characters.
Instructions for Analyzing Data from CAHPS Surveys in SAS
6
Variable CHILD. If your data set combines adult and child data and
You will analyze the data together (macro parameter ADULTKID = 1) or
You will conduct child-only analyses (ADULTKID = 2),
you will need to create the dichotomous variable CHILD to distinguish between adult (CHILD = 0) and
child (CHILD = 1) surveys.
If the CHILD variable is missing from the data set, the Analysis Program creates a CHILD variable and
sets it to CHILD = 0 (Adult).
Requirements for Recoding Survey Response Options
All analytic variables used by the Analysis Program must be numeric. The tables below show the
different types of variables that may need to be recoded.
Yes/No Variables. Variables with “yes/no” response categories for analysis should be coded as 0 (No)
and 1 (Yes) as shown in Table 5.1. All variables with dichotomous response options should be coded in
this manner. For easier interpretation of the results, the “positive” response should have the highest value.
Data for dichotomous variables will most likely need to be recoded as the precodes found in the survey
instrument for the response values typically set the values of the responses to 2 and 1 rather than 0 and 1.
Table 5.1 Yes/No Variables
Typical Response Value
on CAHPS Surveys
Recoded Numeric
Response value Label/description
2 0 No
1 1 Yes
Any other value . (Missing) Not analyzed by the CAHPS Analysis
Program
Three Response Variables. Any variable with three response options should be coded as shown in Table
5.2. For easier interpretation of the results, the “positive” response should have the highest value. Reverse
coding may be necessary to ensure that the most positive response – for example, “Yes, definitely” – has
the highest value.
Table 5.2 Three Response Variables
Typical Response Value
on CAHPS Surveys
Recoded Numeric
Response value Label/description
1 3 Yes, definitely
2 2 Yes, somewhat
3 1 No
All other values . (Missing) Not analyzed by the CAHPS Analysis
Program
Instructions for Analyzing Data from CAHPS Surveys in SAS
7
Four-Point Frequency Scale Variables. Variables with “never” to “always” response options are coded
as shown in Table 5.3. All variables with four response options should be coded in this manner. For easier
interpretation, the “positive” response – for example, “Always” or “Definitely yes” – should have the
highest value.
Table 5.3 Four-Point Frequency Scale Variables
Typical Response Value
on CAHPS Surveys
Recoded Numeric
Response value Label/description
4 1 Definitely no
3 2 Somewhat no/Probably no
2 3 Somewhat yes/Probably yes
1 4 Definitely yes
All other values . (Missing) Not analyzed by the CAHPS Analysis
Program
Coding for Case-Mix Adjuster Variables. If the project team decides to adjust the survey results for
case mix, numeric variables must also be properly coded for each adjuster variable. If the adjuster
variable is used as a continuous variable, the effects associated with the categories are assumed to be
proportional to the differences among the coded values. This approach is different than recoding the
adjuster variable as dichotomous with reference categories (dummy variable).” (See Appendix A for
additional sample SAS code.) The dummy variable corresponding to one category, the “reference
category” should be omitted; coefficients corresponding to any other category represent the estimated
effect of being in that category relative to the reference category. While the choice of reference category
has no effect on the case-mix adjustment results, it is common to use the category with the most responses
(or close to the most) as the reference category. Dummy variable coding allows the differences between
effects associated with different responses to be determined by the data rather than assuming any
particular pattern.
Table 5.4 is an example of recoding AGE into a set of dummy variables, one of which should be omitted
from the ADJUSTER macro parameter as the reference category. The response categories on the CAHPS
surveys may differ from this example, so please refer to the survey for the appropriate response options.
Table 5.4 Example of Creating Dummy Variables – What is your age?
Typical response value
on CAHPS Surveys Recoded Dummy variables and value
Label/description
(years)
1 1 = Age_24under, 0 = not in the group 18 to 24
2 1 = Age25_34, 0 = not in the group 25 to 34
3 1 = Age35_44, 0 = not in the group 35 to 44
4 1 = Age45_54, 0 = not in the group 45 to 54
5 1 = Age55_64, 0 = not in the group 55 to 64
6 1 = Age65_74, 0 = not in the group 65 to 74
7 1 = Age75older, 0 = not in the group 75 or older
All other values Code to missing
Instructions for Analyzing Data from CAHPS Surveys in SAS
8
6. CAHPS Macro Parameters and Call Statements
The CAHPS Analysis Program requires six key parameters (VAR, VARTYPE, NAME, ADULTKID,
DATASET, and OUTNAME). Table 6.1 below lists the required parameters with the valid value ranges;
these parameters have no default value and therefore must be specified. Table 6.2 lists 23 optional
parameters, also with the valid value ranges. These optional parameters all have default values and do not
need to be specified if the default value is acceptable.
If you are using case-mix adjusters, the ADJUSTER parameter is required. The order of the parameters in
the macro call statement does not matter. Parameters should be separated by a comma.
Table 6.1 Required Parameters for the CAHPS Macro 5.0
Required
Parameter Description Values
Var
Name(s) of variable(s)
being analyzed (composite
measure items, global
rating items, or other
single items)
Name(s) of variable(s) from SAS data set to include in
the analysis (e.g., composite measure items, global
rating, or other single items). For composite measure
items, separate the variable names by a single space.
Vartype
Type of variable
Note: variables in
composite measures
should have the same type.
1 = Dichotomous scale (yes/no 0-1)
2 = Global rating scale (0-10)
3 = “How often” scale or other four-point
response scale (“never to always” scale 1-4)
4 = Any type of three-point response scale (1-3)
5 = Other scale (Must assign a value to min_resp
and max_resp arguments)
Name
Description of composite
measure, global rating
item, or other items
Note: This parameter is limited to 40 characters and
can be numeric, text, or a combination of
both.
Adultkid
Specifies how to analyze
child and adult surveys.
Note: If analyzing data
other than adult only, the
CHILD variable must be
included in the data set.
0 = Combine adult and child survey data in
analysis; do not consider interaction effects in
case-mix adjustment. This option can be used
if the data set contains only a single type of
survey.
1 = Combine adult and child survey data in
analysis; consider interaction effects between
child and each case-mix adjuster variable. For
more details, see the box below on Adult and
Child Interactions.
2 = Analyze child data only.
3 = Analyze adult data only.
Dataset
SAS data set name to be
used in the analysis
Name of the SAS data set used in the input file (i.e.,
your data set of CAHPS survey responses).
Outname
Part of SAS data set name
for output tables created
for summary results
Name for the SAS data sets saved with the summary
results. To avoid creating SAS data sets, enter ‘ ‘.
The results tables will still be created for the .out file.
Instructions for Analyzing Data from CAHPS Surveys in SAS
9
Macro Parameter ADULTKID: Adult and Child Interactions
When the macro parameter ADULTKID equals 1, the macro creates adult and child interactions for the
adjuster variables. The macro creates additional adjuster variables, with the naming convention AC1,
AC2, ..., ACn, where n is the total number of adjusters originally submitted in the macro call parameter
ADJUSTER. When there is an adult and child interaction, the macro creates the ACx variables by
looping through the list of adjusters.
For example:
If your adjuster variables are for general health status (GHR), age, and education, then the
following additional interaction adjuster variables are created:
AC1 = GHR * CHILD
AC2 = AGE * CHILD
AC3 = EDUCATION * CHILD
Example SAS Macro Call with Only the Required Parameters
The example call statement below includes only the required parameters. Explanatory text is provided
next to each parameter between the “/*” and “*/”. All text in between these characters is commented out
or ignored by SAS. This example shows a single item (q05_re) being analyzed.
%cahps(
var = q05_re, /*Name of variable to be analyzed*/
vartype = 1, /*Set the type of variable: 1=
dichotomous scale (1/0)*/
name = Make appt for an illness, /*Label for the outcome variable*/
adultkid = 3, /*Specify how to analyze child and
adult surveys: 3= analyze adult data
only*/
dataset = test, /*Name of the input data*/
outname = illness /*Name used for the output data set*/
) ;
Table 6.2 Optional Parameters for CAHPS Macro 5.0
Parameter Description Values
Describing Variables
Min_resp
Used with vartype = 5 only—
the minimum response value
Can be any numeric value. It will be used as the low
value for the valid response options.
Max_resp
Used with vartype = 5 only—
the maximum response value
Can be any numeric value. It will be used as the
high value for the valid response options.
Recode
Recodes the global rating and
the “How often” or 4-point
scales down to three
categories before performing
the case-mix adjustment and
the statistical tests. The
default value is 0.
0 = For the statistical tests, use uncollapsed
response options for the variables in the
Var parameter.
For thePercent of each response table
and report, use the default recode
collapsing into 3 categories.
Instructions for Analyzing Data from CAHPS Surveys in SAS
10
Parameter Description Values
Default recode into 3
categories is as follows:
Rating Scale: Vartype = 2
1. 0-6
2. 7-8
3. 9-10
4-point Scale: Vartype = 3
1. 1-2
2. 3
3. 4
NOTE: If Vartype is not
equal to 2 or 3, then no
recoding.
This method is the default; the RECODE
option is not needed in the macro call if
it = 0.
1 = For the statistical tests and thePercent of
each response, use the default recode
collapsed into 3 categories.
2 = For the statistical tests, use uncollapsed
response options for the variables in the
Var parameter.
For thePercent of each response table
and report, split the “Ratingscale into
three categories with the following break
points: 0-7|8-9|10. The recode for this
option differs from the default recode
such that the top box is just the highest
response for the rating scale.
Rating Scale: Vartype = 2
1. 0-7
2. 8-9
3. 10
For the 4-point scale, use the default recode
collapsed into 3 categories: 1-2|3|4.
3 = For the statistical tests and thePercent of
each response” table, use the recoded
collapsed categories as shown in Recode
option 2 above.
For Case-Mix Adjusting
(only “Adjuster” is required for case-mix adjusting; other case-mix parameters have default values
if not specified)
Adjuster
Name(s) of case-mix adjuster
variables
Name(s) of case-mix adjuster variables—separated
by a space if using more than one case-mix variable.
Adj_bars
Flag indicating if the
frequencies for the response
values are to be case-mix
adjusted for the triple stacked
bar (top box, middle box,
bottom box). The default
value is 0.
0 = Do not case-mix adjust the triple stacked
bars (i.e., top box, middle box, bottom
box).
1 = Case-mix adjust the triple stacked bars
(i.e., top box, middle box, bottom box)
and store the adjusted frequencies along
with the unadjusted frequencies.
Impute
Flag for imputation of
missing data for adjuster
variables. The default value is
0.
0 = Do not impute mean values by plan for all
adjuster variables.
1 = Impute mean values by plan for all
adjuster variables.
Instructions for Analyzing Data from CAHPS Surveys in SAS
11
Parameter Description Values
Proc_type
Assign the procedure type for
the case-mix regression. The
default value is 0.
0 = Run the case-mix model under PROC REG.
1 = Run the case-mix model under PROC
SURVEYREG.
Note: For PROC SURVEYREG, only a cluster
option is available, which sets CLUSTER = PLAN
in the CAHPS Analysis Program.
Saving Files
Bar_stat
Flag indicating if permanent
data sets for the case-mixed
frequencies should be saved.
The default value is 0.
0 = Do not save the statistical results in data
sets for the case-mix adjusted triple
stacked frequency bars (i.e., top box,
middle box, bottom box).
1 = Save the case-mix adjusted statistical
results in permanent data sets for the triple
stacked frequency bars (i.e., top box,
middle box, bottom box).
Kp_resid
Flag used to make the
residual values from the SAS
work data set RES_4_ID in
the STD_DATA module. The
residuals are the response
values after case-mix
adjustments have been made.
The default value is 0.
0 = Do not save the residual response values.
1 = Save the residual response values in a
permanent data set.
Id_resp
If there is a unique variable in
the data set that identifies
each individual respondent,
then this variable name may
be entered here. The default
value is blank.
Blank or the name of a variable in the data set.
This variable can be included in the residual data set
when kp_resid = 1. The variable will be a character
and have a maximum of 50 characters.
Outregre
Flag indicating whether or not
to include the regression
output text created by SAS.
The default value is 0.
0 = No regression output appears in the text
file.
1 = Print out the regression output to the text
file.
Assigning Weights
Wgtplan
Specifies whether or not to
use plan weights for the
plan/entity-level statistical
test. The default value is 0.
Note: Set WGTPLAN = 1
when WGTMEAN is applied.
0 = Do not use the plan/entity weights when
computing the overall mean for the
comparison of plan/entity means. Equal
weighting will be used as in previous
versions of the macro.
1 = Use the sum of the weights to the
plan/entity level of the variable specified
in the parameter wgtmean. This weight is
used for weighting the overall and grand
means used for the F statistic for
Instructions for Analyzing Data from CAHPS Surveys in SAS
12
Parameter Description Values
statistical comparisons of the plan/entity
means.
Wgtmean
Name of the variable storing
the weight values for the plan
means. The default value is
blank. Note: Set WGTPLAN
= 1 when this parameter is
applied.
Blank if no weight is assigned for the plan/entity
or
Specify the name of a variable used for the
plan/entity-level weight.
Wgtresp
Name of the variable storing
the weight values for
individual respondents. The
default value is blank.
Blank if no weight is assigned
or
Specify the name of a variable used for the response
weight.
Overall Weights
Overall_wt
Weight options for
calculating the overall mean.
The default value is 2.
0 = Number of respondents
1 = Equal weighting of PLANS (each plan is
assigned a weight of 1)
2 = Population: sum of respondent weights
(based on WGTMEAN)
Coefficient Weights
Wt_type
Weight options for
calculating the case-mix
regression coefficients. The
default value is 0.
0 = Number of respondents
1 = Equal weighting of PLANS
2 = Population: sum of respondent weights
(based on WGTRESP)
Composite Measure Weights
Even_wgt
Determines how to weight
items when calculating
composite measures. The
default value is 1.
0 = Weight by overall number of respondents
for each item.
1 = Use equal weighting of items in
composite measures
(1 / # of items).
2 = Weight by summing the respondent
weight (based on WGTRESP)
K
Assign a target minimum
number of responses for equal
weighting of items in
composite measures
(even_wgt = 1). The default
value is 1.
Number > 0.
Post-Stratification Weighting
Wgtdata
Specifies whether post-
stratification is used in
weighting. The default value
is 1 (no strata weight).
1 = Do not perform post-stratification
weighting.
2 = Combine strata, conduct post-stratification
weighting.
Instructions for Analyzing Data from CAHPS Surveys in SAS
13
Parameter Description Values
Note: A separate file is
required to specify the strata
variable (See Step 3b. Post-
Stratification Weighting
Case in Appendix A. Using
the Test Data in the CAHPS
Analysis Program).
Analyzing Data Separately
Subset
Perform the case-mix
adjustments and statistical test
based on each subset of plans;
the subset code is a column in
the plan detail file. The
default value is 1.
Note: A separate file is
required to specify the subset
variable (See Step 3b. Post-
Stratification Weighting
Case in Appendix A. Using
the Test Data in the CAHPS
Analysis Program).
1 = No subsetting done. Global case-mix
model and centering.
2 = Global case-mix model with centered
means for each subset before performing
statistical tests.
3 = Subset case-mix model with centered
means for each subset.
Significance Values
Pvalue
Level of significance for
comparisons. The default
value is 0.05.
Valid values are between 0 and 1.
Change
Level of practical significance
based on a percentage
difference from the minimum
absolute theoretical difference
from the overall mean (can be
used only with ‘p-value’
criteria). The default value
is 0.
Value between 0 and 1 (i.e., 25% is entered as 0.25).
Meandiff
Level of practical significance
based on absolute difference
between plan mean and mean
of all plans (can be used only
with ‘p-value’ criteria). The
default value is 0.
Number 0.
Instructions for Analyzing Data from CAHPS Surveys in SAS
14
Example SAS Macro Call with Optional Parameters
The sample call statement below includes some optional parameters. Explanatory text is provided next to
each parameter in between the “/*” and “*/”. All text in between these characters is commented out or
ignored by SAS.
%cahps(
var = q05_re q07_re, /*Name of variables in the composite
measure to be analyzed*/
vartype = 1, /*Set the type of variables: 1=
dichotomous scale (1/0)*/
name = Sample Composite Measure, /*Label for the outcome variable*/
adultkid = 3, /*Specify how to analyze child and
adult surveys: 3= analyze adult data
only*/
adjuster = age1824 age2534 age3544
age5564 age6574 age75 ghs, /*List of case-mix adjuster variables
to include*/
adj_bars = 1, /*Flag for the frequencies to be case-
mix adjusted*/
bar_stat = 1, /*Flag to save case-mix adjusted
frequencies*/
impute = 1, /*Flag to impute case-mix adjuster
variables that are missing*/
dataset = test, /*Name of the input data*/
outname = CompositeMeasureName /*Name used for the output data set*/
) ;
For more examples of how to set these parameters, please refer to Appendix A.
Cases Dropped When Performing Analyses
The Analysis Program drops some cases when performing analyses based on missing data. This section
uses a small data set with ten cases, two entities, two questions, and two case-mix adjuster variables to
demonstrate which cases will be used for the analyses.
This example follows two paths for the analysis of a composite measure consisting of two items: Q1 and
Q2. Run 1 uses no adjuster variables; Run 2 uses two adjuster variables, Adjuster 1 and Adjuster 2,
without imputation of missing values of the adjuster’s mean within plan. Note that each item is equally
weighted for the composite measure in both sample runs.
Run 1: CAHPS Macro Call Statement
%cahps(
var = Q1 Q2, /*Name of the two variables in the
composite measure to be analyzed*/
vartype = 3, /*Set the type of variable: 3= 4-point
scale*/
name = Sample Composite Measure, /*Label for the outcome variable*/
adultkid = 0, /*Specify how to analyze child and
adult surveys: 0= analyze all data
all data is adult*/
dataset = test, /*Name of the input data*/
outname = SampCompositeName_Run1 /*Name used for the output data set*/
) ;
Run 2: CAHPS Macro Call Statement
%cahps(
var = Q1 Q2, /*Name of the two variables in the
composite measure to be analyzed*/
vartype = 3, /*Set the type of variable: 3= 4-point
scale*/
Instructions for Analyzing Data from CAHPS Surveys in SAS
15
name = Sample Composite Measure, /*Label for the outcome variable*/
adultkid = 0, /*Specify how to analyze child and
adult surveys: 0= analyze all data
all data is adult*/
adjuster = A1 A2, /*List of case-mix adjuster variables
to include*/
impute = 1, /*Flag if missing case-mix adjusters
to be imputed 1 default, do not
impute*/
dataset = test, /*Name of the input data*/
outname = SampCompositeName_Run2 /*Name used for the output data set*/
) ;
The macro cleans the two items, Q1 and Q2, to make sure the values are within the valid range for the
given variable type. The macro call indicated they are a type 3 variable, which means the response values
must be a 1, 2, 3, or 4. Any other response value is set to missing. In our small data set, Q1 has a value of
7 (case 10) that is set to missing; all other values are fine. The adjuster values are not cleaned in the
macro, so all values are accepted.
The macro checks for missing values in each case and determines whether to keep the record based on the
macro parameter arguments. The results may differ depending on whether adjusters are used and whether
missing adjusters get an imputed mean value. The cases that are dropped for Run 1 and Run 2, and the
reasons why, are noted as in the last two columns of the table below.
Please note that the variable PLAN refers to your entity of analysis. The periods (.) in Table 6.3 represent
missing values. The case numbers are not a part of the data set; they are used only for reference purposes
later.
Table 6.3 Original Data and Data Used for Analysis
Case Plan
Composite
Measure
Item: Q1
Composite
Measure
Item: Q2
Adjuster
Adj 1
Adjuster
Adj 2
Run 1
(No Adj)
Case Dropped
Run 2
(With Adj)
Case Dropped
1 A 2 4 1 1
2 A 3 . 2 2
3 A 4 2 3 . Dropped: Adj 2
Missing
4 A 4 3 . . Dropped: Adj 1
& Adj 2
Missing
5 A 3 3 2 3
6 B 3 3 2 3
7 B . . 4 5 Dropped: Q1 &
Q2 Missing
Dropped: Q1 &
Q2 Missing
8 B 2 2 5 4
9 B 3 2 6 3
10 B . (7) 3 3 3
Note: The cases in Run 2 that were dropped due to missing adjuster information, could be retained if using the parameter
“IMPUTE” to impute missing adjuster information (see Table 6.2).
Instructions for Analyzing Data from CAHPS Surveys in SAS
16
The macro uses the cases retained from the first cleaning step: nine records for Run 1 and seven cases for
Run 2. The macro reports and summarizes the number of cases used in each analysis, the percent missing
for each variable, and the percent breakdown of the response categories.
Risk of Out-of-Range Values for Case-mixed Scores
In the special cases where there are very few records for an analysis variable or all respondents answered
in only one or two response categories, it is possible for the case-mix adjusted values to be out of range.
For example, if all respondents to a Health Plan Survey answered “Yes,” where 0= “No” and 1= “Yes” to
a yes/no question, and the adjustment increases the mean score for that health plan, the adjusted mean for
that health plan would be greater than one. Further, the adjusted frequencies would be less than zero
percent for the “No” category and greater than 100 percent for the “Yes” category.
The macro does not force a change in these values, since it would change the mean of the means on the
adjusted scores but not on the unadjusted scores. When reporting your CAHPS survey results, it is
important to set these out-of-range values to the minimum or maximum value for that category. If
necessary, you can then make a manual adjustment to the adjacent category. For example, in the case of
three response categories, where the minimum frequency should be zero and the maximum value is 100,
the case-mixed frequency results are as follows:
category 1 = -2.0,
category 2 = 25.0 and
category 3 = 77.0
The results could be manually adjusted so that
category 1 = 0.0,
category 2 = 23.0 and
category 3 = 77.0.
Instructions for Analyzing Data from CAHPS Surveys in SAS
17
7. SAS Data Sets Generated by the CAHPS Analysis Program
The CAHPS Analysis Program creates permanent SAS data sets that contain the results of the analyses
performed for each composite measure, single item measure, or global rating. Tables 7.1-7.5 provide a
listing of the SAS data sets created by the CAHPS Analysis Program as well as their naming conventions.
Tables 7.6-7.17 describe the variables included in the SAS data sets that have been created.
SAS Data Sets Output and Naming Conventions
The SAS data sets implement the following naming conventions where &OUTNAME is the text assigned
by the user to the variable “OUTNAME” in the CAHPS macro call.
Table 7.1 SAS Data Sets Output for All CAHPS Macro Calls
Description
SAS Data Set
Naming Convention
Lists any plans dropped by macro with only 0 or 1 records DP&OUTNAME
Lists any plans with less than 100 responses LR&OUTNAME
Unadjusted and adjusted (if adjusters are included) percentages for each
response option
(Collapsed into three categories for global rating: 0-6, 7-8, and 9-10, or 0-7,
8-9, 10; for 4-Point Scales: 1-2, 3, and 4)
N_&OUTNAME
Overall statistics for all entities combined OA&OUTNAME
Percentage missing on the VAR parameter (and Adjusters if included) P_&OUTNAME
Score details for all entities including significant differences SA&OUTNAME
Table 7.2 shows the data sets that are created by adding in case-mix adjusters.
Table 7.2 Additional SAS Data Sets Output from Case-Mix Adjustment
Description
SAS Data Set
Naming Convention
Regression coefficients for each adjuster variable C_&OUTNAME
R-squared values R2&OUTNAME
Residual values (only if parameter KP_RESID = 1) Y_&OUTNAME
Table 7.3 shows the data files created if the user keeps permanent data sets for case-mix adjusted
frequencies using the options:
ADJ_BARS= 1
BAR_STAT = 1 and
the stratified weighting option is not used (WGTDATA = 1).
If VARTYPE = 5, there will be additional SAS Data Sets created for each response option.
Instructions for Analyzing Data from CAHPS Surveys in SAS
18
Table 7.3 Additional SAS Data Sets Output from Saving Case-Mix Adjusted
Frequencies
Description
SAS Data Set
Naming Convention
Score details for all entities for
Bottom box for Global Rating: 0-6 or 0-7; for 4-Point scales: 1-2; for 3-
Point scales: 1
Top box for Yes/No scales: Yes
B1&OUTNAME
Score details for all entities for
Middle box for Global Rating: 7-8 or 8-9; for 4-Point scales: 3; for 3-
Point scales: 2
Bottom box for Yes/No scales: No
B2&OUTNAME
Score details for all entities
Top box for Global Rating: 9-10 or 10; for 4-Point scales: 4; and for 3-
Point scales: 3
This data set will not appear when the VAR is dichotomous.
B3&OUTNAME
Overall statistics
Bottom box for Global Rating: 0-6 or 0-7; for 4-Point scales: 1-2; for 3-
Point scales: 1
Top box for Yes/No scales: Yes
F1&OUTNAME
Overall statistics
Middle box for Global Rating: 7-8 or 8-9; for 4-Point scales: 3; for 3-
Point scales: 2
Bottom box for Yes/No scales: No
F2&OUTNAME
Overall statistics for
Top box for Global Rating: 9-10 or 10; for 4-Point scales: 4; and for 3-
Point scales: 3
This data set will not appear when the VAR is dichotomous.
F3&OUTNAME
Table 7.4 shows the additional data sets that are created by adding in the post-stratification weighting
option (WGTDATA = 2). These data sets provide similar information to the data sets produced when not
selecting the option, except they provide the post-stratified weighted results. Specifically, when using the
WGTDATA=2 option, the core SAS data sets (e.g., N_&OUTNAME, OA&OUTNAME,
P&OUTNAME) provide the results for each strata, while the data sets listed in Table 7.4 are based on the
results weighted by the strata (i.e., the post-stratified weighted results).
Instructions for Analyzing Data from CAHPS Surveys in SAS
19
Table 7.4 Additional SAS Data Sets Output from Post-Stratification Weighting Option
Description
SAS Data Set
Naming Convention
Unadjusted and adjusted (if adjusters are included) post-stratified weighted
percentages of each response (similar to N_&OUTNAME)
NW&OUTNAME
Overall statistics for all entities combined (similar to OA&OUTNAME but
with post-stratified weighted results)
OW&OUTNAME
Percentage missing on the VAR parameter (similar to P_&OUTNAME) PW&OUTNAME
Score details for all entities including significant differences (similar to
SA&OUTNAME but with post-stratified weighted results)
SW&OUTNAME
Table 7.5 provides the data files created if the following parameter statements are included to keep
permanent data sets for case-mix adjusted frequencies:
ADJ_BARS= 1,
BAR_STAT = 1, and
Including post-stratification weighting option (WGTDATA = 2).
These data sets provide similar information to the data sets produced when not selecting the post-
stratification weighting option, except they provide the post-stratified weighted results. Specifically, when
using WGTDATA=2 option, the core SAS data sets (e.g., B1, B2 and B3&OUTNAME, F1, F2 and
F3&OUTNAME) provide the results for each strata, while the data sets listed in Table 7.5 are based on
the results weighted by the strata or the post-stratified weighted results. If VARTYPE = 5, there will be
additional SAS Data Sets created for each response option.
Table 7.5 Additional SAS Data Sets Output from Saving Case-Mix Adjusted
Frequencies for Post-Stratification Weighting Option
Description
SAS Data Set
Naming Convention
Post-stratification weighted score details for all entities
Bottom box for Global Rating: 0-6 or 0-7; for 4-Point scales: 1-2; for 3-
Point scales: 1
Top box for Yes/No scales: Yes
BA&OUTNAME
Post-stratification weighted score details for all entities
Middle box for Global Rating: 7-8 or 8-9; for 4-Point scales: 3; for 3-
Point scales: 2
Bottom box for Yes/No scales: No
BB&OUTNAME
Post-stratification weighted score details for all entities
Top box for Global Rating: 9-10 or 10; for 4-Point scales: 4; for 3-Point
scales: 3
This data set will not appear when the VAR is dichotomous.
BC&OUTNAME
Instructions for Analyzing Data from CAHPS Surveys in SAS
20
Description
SAS Data Set
Naming Convention
Post-stratification weighted overall statistics
Bottom box for Global Rating: 0-6 or 0-7; for 4-Point scales: 1-2; for 3-
Point scales: 1
Top box for Yes/No scales: Yes
FA&OUTNAME
Post-stratification weighted overall statistics
Middle box for Global Rating: 7-8 or 8-9; for 4-Point scales: 3; for 3-
Point scales: 2
Bottom box for Yes/No scales: No
FB&OUTNAME
Post-stratification weighted overall statistics
Top box for Global Rating: 9-10 or 10; for 4-Point scales: 4; for 3-Point
scales: 3
This data set will not appear when the VAR is dichotomous.
FC&OUTNAME
Contents of CAHPS Analysis Program SAS Data Sets
The following tables (7.6 – 7.11) list the contents of the CAHPS Analysis Program SAS data sets created
for all macro runs. These data sets implement the following naming conventions where &OUTNAME is
the text assigned by the user to the parameter “OUTNAME” in the CAHPS macro call.
Table 7.6 Contents of DP&OUTNAME: Plans Dropped
Variable name Description
ALLN Total number of respondents in the data set by PLAN
ORIGPLAN PLAN dropped from analysis as there were fewer than 2 records
USEN Number of usable records for PLAN
Table 7.7 Contents of LR&OUTNAME: Lists Plans with 100 or Fewer Records
Variable name Description
ALLN Total number of respondents in the data set by PLAN
NEWPLAN PLAN name (If stratification case, this is the unstratified entity.
Otherwise, this variable contains the same entity as ORIGPLAN)
NPLAN_ID New Plan ID (If stratification case, this is the unstratified entity.
Otherwise, this variable contains the same entity as OPLAN_ID)
OPLAN_ID Original Plan ID (If stratification case, this will be different from
NPLAN_ID)
ORIGPLAN PLAN name (If stratification case, this will be different from
NEWPLAN)
PLAN Entity ID (This is the same as OPLAN_ID. If stratified weight option is
not selected, NPLAN_ID is also the same as entity ID. If subsetting is
not used, SPLAN_ID is the same as entity ID )
PLANTXT Entity ID and Entity name
SPLAN_ID Plan ID (If subsetting is used, the ID number will be created by each
subset. If no subsetting is used, this will be the same as OPLAN_ID)
Instructions for Analyzing Data from CAHPS Surveys in SAS
21
Variable name Description
STRATWGT Strata Weight (If no stratification is used, this will be 1)
SUB_ID Subset ID (If no subset is used, this will be 1)
SUBCODE If subsetting is used, the subset name or code is found in this column.
Otherwise it defaults to 1.
USEN Number of usable records for PLAN
USENTXT Number of usable records for PLAN (stored as a character variable)
Table 7.8 Contents of N_&OUTNAME: Response Option Percentages
Variable name Description
ALLN Total number of respondents in the data set by PLAN
OPLAN_ID Original PLAN ID
PLANNAME Entity name
PTRES1 Percent response for
Bottom box for Global Rating: 0-6 or 0-7; for 4-Point scales: 1-2;
for 3-Point scales: 1
Top box for Yes/No scales: Yes
PTRES2 Percent response for
Middle box for Global Rating: 7-8 or 8-9; for 4-Point scales: 3; for
3-Point scales: 2
Bottom box for Yes/No scales: No
PTRES3 Percent response for
Top box for Global Rating: 9-10 or 10; for 4-Point scales: 4; for 3-
Point scales: 3
This variable will not appear when the VAR is dichotomous. If
VARTYPE = 5, there will be additional response percentages for each
response option.
SUBCODE If subsetting is used, the subset name or code is found in this column.
Otherwise it defaults to 1.
USEN Number of usable records for PLAN
Additional Variables Included When ADJ_BARS = 1
ADJ_1 Case-mix adjusted percentages for
Bottom box for Global Rating: 0-6 or 0-7; for 4-Point scales: 1-2;
for 3-Point scales: 1
Top box for Yes/No scales: Yes
ADJ_2 Case-mix adjusted percentages for
Middle box for Global Rating 7-8 or 8-9; for 4-Point scales: 3; for
3-Point scales: 2
Bottom box for Yes/No scales: No
Instructions for Analyzing Data from CAHPS Surveys in SAS
22
Variable name Description
ADJ_3 Case-mix adjusted percentages for
Top box for Global Rating: 9-10 or 10; for 4-Point scales: 4; and
for 3-Point scales: 3
This variable will not appear when the VAR is dichotomous. If
VARTYPE = 5, there will be additional case-mix adjusted percentages
for each response option.
If WGTMEAN is not assigned, then WGT_1-WGT3 will be the same as PTRES1- PTRES3
WGT_1 Unadjusted and weighted percentages for
Bottom box for Global Rating: 0-6 or 0-7; for 4-Point scales: 1-2;
for 3-Point scales: 1
Top box for Yes/No scales: Yes
WGT_2 Unadjusted and weighted percentages for
Middle box for Global Rating: 7-8 or 8-9; for 4-Point scales: 3; for
3-Point scales: 2
Bottom box for Yes/No scales: No
WGT_3 Unadjusted and weighted percentages for
Top box for Global Rating 9-10 or 10; for 4-Point scales: 4; and for
3-Point scales: 3
This variable will not appear when the VAR is dichotomous. If
VARTYPE = 5, there will be additional unadjusted and weighted
percentages for each response option.
Table 7.9 Contents of OA&OUTNAME: Results from Test for Significant Differences
Variable name Description
DFE The denominator degrees of freedom
DFR The numerator degrees of freedom
GM Grand mean used for F statistics
NTOT Number of respondents analyzed
OV_MEAN The mean of all the PLAN means
OVERALLF The results of the F-test on the null hypothesis for no difference between
entity means
OVERALLP P-value of the F distribution. If the P-value is less than 0.05 (or other
preferred value), the PLAN means are significantly different.
SUBCODE If subsetting is used, the subset name or code is found in this column.
Otherwise it defaults to GLOBAL.
Instructions for Analyzing Data from CAHPS Surveys in SAS
23
Table 7.10 Contents of P_&OUTNAME: Percent Missing Data
Variable name Description
ALLN Total number of respondents in the data set by PLAN
PLAN Entity ID
PLANNAME Entity name
&VAR The percent of responses on the VAR variable(s) that are missing by
PLAN
SUBCODE If subsetting is used, the subset name or code is found in this column.
Otherwise it defaults to 1.
Additional variables included when ADJUSTER(S) are included
A separate variable for each adjuster is included in the results. This example shows 4 adjuster
variables.
ADJUSTER1 The percent of responses on the Adjuster 1 variable that are missing by
PLAN
ADJUSTER2 The percent of responses on the Adjuster 2 variable that are missing by
PLAN
ADJUSTER3 The percent of responses on the Adjuster 3 variable that are missing by
PLAN
ADJUSTER4 The percent of responses on the Adjuster 4 variable that are missing by
PLAN
Table 7.11 Contents of SA&OUTNAME: Plan Level Results
Variable name Description
ADJ_MEAN Weighted (if assigned) and adjusted plan mean for case-mix adjuster
variables. If no adjuster or weighting is selected, this will match
UNA_MEAN.
ALLN Total number of respondents in the data set by PLAN
CL95 Half-width of the 95% confidence interval, calculated as 1.96*SE. The
true (population) value of the estimate (DELTA) falls within the interval
(Estimate -CL95, Estimate +CL95) with 95% confidence.
DELTA The difference between the PLAN mean and overall mean
MEANING Rating of plan performance for the VAR variable based on a comparison
of the “Plan Mean” to “Overall Mean.
Identifies statistically meaningful differences:
1 = Plan was significantly below average
2 = Plan was not significantly above or below average
3 = Plan was significantly above average
PLAN_WGT Value of the PLAN weight. If weight not assigned, then this column has
PLAN_WGT = 1.
PLANNAME Entity name
SE Standard error of “Plan Difference From Mean” or Delta
SUBCODE If subsetting is used, the subset name or code is found in this column.
Otherwise it defaults to 1.
Instructions for Analyzing Data from CAHPS Surveys in SAS
24
Variable name Description
UNA_MEAN Weighted (if assigned) and unadjusted plan mean (even if case-mix
adjuster variables are included)
USEN Number of usable records for PLAN
UWT_MEAN Unweighted and unadjusted plan mean (even if weighting is assigned
and case-mix adjuster variables are included)
VP Variance of the plan means
Six additional files are created when case-mix adjustment is performed AND
the ADJ_BARS= 1 and BAR_STAT = 1 options are chosen, and
no post-stratification weighting is performed (WGTDATA = 1 (no post-stratification weighting)).
The first three data sets have the same variables described in SA&OUTNAME, which provides the PLAN
level overall results (see Table 7.11) but these additional files provide results for each response option
(collapsed).
B1&OUTNAME provides the statistics for the:
Bottom box for the Global Rating (Response options 0-6 or 0-7), 4-Point scales
(Response options 1-2), or 3-Point scales (Response option 1).
Top box for the dichotomous Yes/No scales (Yes response option).
B2&OUTNAME provides the statistics for the:
Middle box for the Global Rating (Response options 7-8 or 8-9), 4-Point scales
(Response options 3), or 3-Point scales (Response option 2).
Bottom box for the dichotomous Yes/No scales (No response option).
B3&OUTNAME provides the statistics for the:
Top box for the Global Rating (Response options 9-10 or 10), 4-Point scales
(Response options 4), or 3-Point scales (Response option 4).
For dichotomous variables, this data set is not created.
The second three data sets have the same variables described in OA&OUTNAME, which provides overall
results for all PLANS combined (see Table 7.9), but these additional files provide results for each
response option (collapsed).
F1&OUTNAME provides the results from the tests for significant differences between entities
for the:
Bottom box for the Global Rating (Response options 0-6 or 0-7), 4-Point scales
(Response options 1-2), or 3-Point scales (Response option 1).
Top box for dichotomous Yes/No scales (Yes response option).
Instructions for Analyzing Data from CAHPS Surveys in SAS
25
F2&OUTNAME provides the results from the tests for significant differences between entities
for the:
Middle box for the Global Rating (Response options 7-8 or 8-9), 4-Point scales
(Response options 3), or 3-Point scales (Response option 2).
Bottom box for dichotomous Yes/No scales (No response option).
F3&OUTNAME provides the results from the tests for significant differences between entities
for the:
Top box for the Global Rating (Response options 9-10 or 10), 4-Point scales
(Response options 4), or 3-Point scales (Response option 4).
For dichotomous variables, this data set is not created.
Tables 7.12 – 7.14 lists the contents of the additional SAS data sets produced when using case-mix
adjusters.
Table 7.12 Contents of C_&OUTNAME: Case-mix Adjustment Regression
Coefficients
Variable name Description
COE_&OUTNAME Case-mix adjustment regression coefficients
P_&OUTNAME P-value of case-mix adjustment regression coefficient
SE_&OUTNAME Standard error of case-mix adjustment regression coefficient
SUBCODE If subsetting is used, the subset name or code is found in this column.
Otherwise it defaults to GLOBAL.
VARIABLE Name of case-mix adjuster variable(s)
Table 7.13 Contents of R2&OUTNAME – R-Squared Results for the Case-mix
Adjustment Regression
Variable name Description
_ADJRSQ_ The adjusted R-squared value from the regression for the dependent
variable (OUTNAME variable)
_DEPVAR_ Name of the OUTNAME variable
_RSQ_ The R-squared value from the regression for the dependent variable
(OUTNAME variable)
SPLIT This variable is used when data is split into two groups. If the data is
not split, it defaults to 0.
SUBCODE If subsetting is used, the subset name or code is found in this column.
Otherwise it defaults to GLOBAL.
Instructions for Analyzing Data from CAHPS Surveys in SAS
26
Table 7.14 lists the contents of the additional SAS data set produced when using case-mix adjuster
variables and when KP_RESID = 1.
Table 7.14 List of Macro Results – Y_&OUTNAME
Variable name Description
_ID Individual assigned ID
_ID_RESP Original Respondent ID
ITEMNO_MAX Number of items (for single measures this will equal 1; for composite
measures this will equal the number of items in the composite measure)
PLAN Entity ID
YRESID Residual
Tables 7.15 – 7.17 provide the contents of the data sets produced when using the post-stratification
weighting option.
Table 7.15 List of Macro Results – NW&OUTNAME (Similar to N_&OUTNAME but
provides the post-stratification weighted results)
Variable name Description
ALLN Total number of respondents in the data set by PLAN
OPLAN_ID Original PLAN ID
PLANNAME Entity name
PTRES1 Post-stratification weighted percentage for
Bottom box for Global Rating 0-6 or 0-7; for 4-Point scales: 1-2; for
3-Point scales: 1
Top box for Yes/No scales: Yes
PTRES2 Post-stratification weighted percentage for
Middle box for Global Rating:7-8 or 8-9; for 4-Point scales: 3; for 3-
Point scales: 2
Bottom box for Yes/No scales: No
PTRES3
Post-stratification weighted percentage for
Top box for Global Rating: 9-10 or 10; for 4-Point scales: 4; for 3-
Point scales: 3
This variable will not appear when the VAR is dichotomous. If
VARTYPE = 5, there will be additional post-stratification weighted
percentages for each response option.
SUBCODE If subsetting is used, the subset name or code is found in this column.
Otherwise it defaults to 1.
USEN Number of usable records for PLAN
Instructions for Analyzing Data from CAHPS Surveys in SAS
27
Variable name Description
Additional Variables Included When ADJ_BARS = 1
ADJ_1 Post-stratification weighted case-mix adjusted percentages for
Bottom box for Global Rating 0-6 or 0-7; for 4-Point scales: 1-2; for
3-Point scales: 1
Top box for Yes/No scales: Yes
ADJ_2 Post-stratification weighted case-mix adjusted percentages for
Middle box for Global Rating: 7-8 or 8-9; for 4-Point scales: 3; for
3-Point scales: 2
Bottom box for Yes/No scales: No
ADJ_3 Post-stratification weighted case-mix adjusted percentages for
Top box for Global Rating: 9-10 or 10; for 4-Point scales: 4; for 3-
Point scales: 3
This variable will not appear when the VAR is dichotomous. If
VARTYPE = 5, there will be additional post-stratification weighted
case-mix adjusted percentages for each response option.
If WGTMEAN are not assigned, then WGT_1-WGT3 will be the same as PTRES1- PTRES3
WGT_1 Unadjusted, weighted, and post-stratification weighted percentages for
Bottom box for: Global Rating 0-6 or 0-7; for 4-Point scales: 1-2;
for 3-Point scales: 1
Top box for Yes/No scales: Yes
WGT_2
Unadjusted, weighted, and post-stratification weighted percentages for
Middle box for Global Rating:7-8 or 8-9; for 4-Point scales: 3; for 3-
Point scales: 2
Bottom box for Yes/No scales: No
WGT_3 Unadjusted, weighted, and post-stratification weighted percentages for
Top box for Global Rating: 9-10 or 10; for 4-Point scales: 4; for 3-
Point scales: 3
This variable will not appear when the VAR is dichotomous. If
VARTYPE = 5, there will be additional unadjusted, weighted, and post-
stratification weighted percentages for each response option.
Table 7.16 Contents of OW&OUTNAME (Similar to OA&OUTNAME but provides
post-stratification weighted results)
Variable name Description
DFE The denominator degrees of freedom
DFR The numerator degrees of freedom
GM Grand mean used for F statistics
NTOT # of respondents analyzed
OV_MEAN The mean of all the PLAN means
Instructions for Analyzing Data from CAHPS Surveys in SAS
28
Variable name Description
OVERALLF The results of the F-test on the null hypothesis for no difference between
entity means
OVERALLP P-value of the F distribution. If the P-value is less than 0.05 (or other
preferred value), the PLAN means are significantly different.
SUBCODE If subsetting is used, the subset name or code is found in this column.
Otherwise it defaults to GLOBAL.
Table 7.17 Contents of SW&OUTNAME (Similar to SA&OUTNAME but provides post-
stratification weighted results)
Variable name Description
ADJ_MEAN Post-stratification weighted, weighted (if assigned by WGTRESP) and
adjusted plan mean for case-mix adjuster variables. If no adjuster or
weighting is selected, this will match UNA_MEAN.
ALLN Total number of respondents in the data set by PLAN
CL95 Half-width of the 95% confidence interval, calculated as 1.96*SE. The
true (population) value of the estimate (DELTA) falls within the interval
(Estimate -CL95, Estimate +CL95) with 95% confidence.
DELTA The difference between plan mean and overall mean
MEANING Rating of plan performance for the VAR variable based on a comparison
of plan’s adjusted and post-stratification weighted “Plan Mean” to
“Overall Mean.”
Identifies statistically meaningful differences.
1 = Plan was significantly below average
2 = Plan was not significantly above or below average
3 = Plan was significantly above average
PLAN_WGT
Value of plan weight. If weight not assigned, then this column has
PLAN_WGT = 1.
PLANNAME Entity name
SE Standard error of “Plan Difference From Mean” or Delta
SUBCODE If subsetting is used, the subset name or code is found in this column.
Otherwise it defaults to 1.
UNA_MEAN Weighted (if assigned by WGTRESP) and unadjusted plan mean (even if
case mix adjuster variables are included)
USEN Number of usable records for PLAN
UWT_MEAN Unweighted, unadjusted, and post-stratification weighted plan mean
(even if weighting is assigned and case mix adjuster variables are
included)
VP Variance of the plan means
Instructions for Analyzing Data from CAHPS Surveys in SAS
29
Six additional files are created when case-mix adjustment is performed and
the ADJ_BARS= 1 and BAR_STAT = 1 options are chosen and
post-stratification weighting is performed (WGTDATA = 2 (post-stratification weighting)).
The first three data sets have the same variables described in SW&OUTNAME, which provides the
PLAN level overall results (see Table 7.17), as well as in B1, B2 and B2&OUTNAME. These additional
files provide results for each response option (collapsed).
BA&OUTNAME provides the post-stratification weighted statistics for the:
Bottom box for the Global Rating (Response options 0-6 or 0-7), 4-Point scales
(Response options 1-2), or 3-Point scales (Response option 1).
Top box for dichotomous Yes/No scales (Yes response option).
BB&OUTNAME provides the post-stratification weighted statistics for the:
Middle box for the Global Rating (Response options 7-8 or 8-9), 4-Point scales
(Response options 3), or 3-Point scales (Response option 2).
Bottom box for dichotomous Yes/No scales (No response option).
BC&OUTNAME provides the post-stratification weighted statistics for the:
Top box for the Global Rating (Response options 9-10 or 10), 4-Point scales
(Response options 4), or 3-Point scales (Response option 4).
For dichotomous variables, this data set is not created.
The second three data sets have the same variables described in OW&OUTNAME, which provides
overall results for all PLANS combined (see Table 7.16) as well as the F1-F3&OUTNAME data set.
These additional files provide results for each response option (collapsed).
FA&OUTNAME provides the post-stratification weighted results from the tests for significant
differences between entities for the:
Bottom box for the Global Rating (Response options 0-6 or 0-7), 4-Point scales
(Response options 1-2) or 3-Point scales (Response option 1).
For dichotomous Yes/No scales it provides the top box or Yes response option.
FB&OUTNAME provides the post-stratification weighted results from the tests for significant
differences between entities for the:
Middle box for the Global Rating (Response options 7-8 or 8-9), 4-Point scales
(Response options 3), or 3-Point scales (Response option 2).
Bottom box for dichotomous Yes/No scales (No response option).
FC&OUTNAME provides the post-stratification weighted results from the tests for significant
differences between entities for the:
Top box for the Global Rating (Response options 9-10 or 10), 4-Point scales
(Response options 4), or 3-Point scales (Response option 4).
For dichotomous variables, this data set is not created.
Instructions for Analyzing Data from CAHPS Surveys in SAS
30
Appendix A. Using the Test Data in the CAHPS Analysis Program
This appendix explains how to use the test programs and data set provided in the ZIP file for the CAHPS
Analysis Program version 5.0:
MACRO_CAHPS50.SAS - This is the core SAS macro program that performs the analyses the
user specifies in the SAS test program. The macro file should not be modified.
_1_TEST_FORMAT_CAHPS50.SAS - This program creates formats, which are helpful to
view the data with descriptive words instead of the numeric data values assigned in data (e.g.,
“Always” is shown rather than a “4”).
_2_TEST_PREPDATA_CAHPS50.SAS - This program contains sample code to create
recoded versions of some variables for the macro run (e.g., by creating dichotomous or reversed-
coded variables).
_3A_TEST_CAHPS50.SAS - This short program provides sample code for calling the macro
program with different analysis options and outputs specified.
_3B_TEST_CAHPS50_STRATIFIED.SAS - This program contains sample code for calling
the macro with the post-stratification weighting option.
TEST_CAHPS50_DATA.SAS7BDAT - This sample SAS data set is used with all the test
programs listed above.
TEST_CAHPS50_DATA_recoded.SAS7BDAT - This sample SAS data set is similar to
TEST_CAHPS50_DATA.SAS7BDAT except the recoded variables created by
_2_TEST_PREPDATA_CAHPS50.sas. have already been created for you. This SAS data set is
for users who do not need to use _2_TEST_PREPDATA_CAHPS50.sas.
Before you begin, you need to assign two different library paths for input and output data. The sample
code is listed below. Additionally, the formats are stored in the input data set directory.
%let ProgramName = Test_cahps50 ;
%let root = /data/cahps/analysis_program/version5.0 ;
libname in “&root./data/” ;
libname out “&root./Test_cahps50/” ;
libname library “&root./data/” ;
Once you have determined the location of the input and output data, you can begin to use each test
program following the steps below.
Step 1. Creating the Format Catalog (1_TEST_FORMAT_CAHPS50)
To use the test data, you should first run _1_TEST_FORMAT_CAHPS50.sas to create the formats.
Within the program, you will need to assign a library name for storing the format file as shown below.
Note that the directory for the formats should have the same path as the data set used for the macro run.
%let root = /data/cahps/analysis_program/version5.0 ;
libname in “&root./data/your format path here” ;
Instructions for Analyzing Data from CAHPS Surveys in SAS
31
Below is an excerpt from the Test Format Catalog.
proc format library = in ; /*place the library name where you want to
store the format*/
title "CAHPS Survey Formats for TEST Data Set Version &version " ;
value ynb /*Provides the labeled response categories for each response option
for all variables assigned the format ynb*/
. = ' .: Missing '
1 = ' 1: Yes '
2 = ' 2: No '
98 = '98: Inapplicable '
99 = '99: No Answer Given '
;
value edu /*Provides the labeled response categories for each response option
for all variables assigned the format edu*/
1=' 1: <= 8TH GRADE'
2=' 2: SOME HS'
3=' 3: HS GRAD/GED'
4=' 4: SOME COLLEGE/2-YR DEGREE'
5=' 5: 4-YR COLLEGE GRAD'
6=' 6: >4-YR COLLEGE DEGREE'
98="98: DON'T KNOW"
99='99: REFUSED' ;
run;
Step 2. Preparing the Data for the Macro (_2_TEST_PREPDATA_CAHPS50)
The 2_TEST_PREPDATA_CAHPS50.sas program demonstrates sample SAS code to prepare the data
set before the macro run. More resources can be found in Preparing Data from CAHPS Surveys for
Analysis (available on the AHRQ CAHPS web page about analyzing CAHPS survey data).
The sample code provided in the TEST_PREPDATA file is intended to work with the TEST data set.
You can utilize this code for your own data sets, but will need to make modifications to the statements
depending on the variable names and variable response options in your data set.
1. Set permanent or temporary SAS data set.
data adult;
set in.test_cahps50;
2. Recodes numeric plan variables to character to simplify interpretation of
the result tables.
length plan $ 16 ;
if planid = 1 then plan = ‘PRACTICE_A_URBAN’ ;
else if planid = 2 then plan = ‘PRACTICE_B_URBAN’ ;
else if planid = 3 then plan = ‘PRACTICE_C_URBAN’ ;
else if planid = 4 then plan = ‘PRACTICE_B_RURAL’ ;
else if planid = 5 then plan = ‘PRACTICE_C_RURAL’ ;
3. Recodes dichotomous variables from 1-2 to 1-0; such that the largest number
represents the most positive response.
array org q05 q26;
array rev q05_re q26_re;
do i = 1 to dim ( rev ) ;
if org [i] in (1, 2) then rev [i] = 2 - org [i] ;
else rev [i] = . ;
end ;
Instructions for Analyzing Data from CAHPS Surveys in SAS
32
4. REVERSE codes item in which never is a positive response and always is a
negative response.
array org2 q23 q24;
array rev2 q23_re q24_re;
do i = 1 to dim ( rev2 ) ;
if org2 [i] in (1, 2, 3, 4, 5)
then rev2 [i] = 6 - org2 [i] ;
else rev2 [i] = . ;
end ;
5. Cleans variables age and general health status for out of range values.
age = q25;
ghr = q27;
if age not in (1, 2, 3, 4, 5, 6, 7) then age = . ;
if ghr not in (1, 2, 3, 4, 5, 6) then ghr = . ;
Step 3. Running Macro_CAHPS50.sas—Specifying Parameter Options
This step is divided into two steps. For no post-stratification weighting case, please refer to Step 3a. For
post-stratification case, please refer to Step 3b.
Step 3a. No Post-Stratification Weighting Case
The following statement includes the macro code MACRO_CAHPS50.SAS where the path before
“Macro_cahps50.sas” is the location of the Macro file.
%include "/data/cahps/macros/data/Macro_cahps50.sas” ;
Examples of using these arguments with the test data set are provided below.
* Executes CAHPS macro for the global rating scale variable, with no case-mix adjuster variables.
%cahps(
var = q18, /*Name of the variable to be
analyzed*/
vartype = 2, /*Set the type of variable: 2 = rating
scale(0-10)*/
name = Rating Provider, /*Label for the outcome variable*/
adultkid = 3, /*Specify how to analyze child and
adult surveys: 3 = analyze adult data
only*/
dataset = test, /*Name of the input data set*/
outname = rprov /*Name used for the output data set*/
);
* Executes CAHPS macro for the “How Often” composite measure, with two case-mix adjuster
variables; instructs the macro to impute any missing case-mix adjuster variable responses;
recodes the 4-point scale to collapse into 1-2|3|4| for 3-part frequency; and uses PROC
SURVEYREG for the case-mix model.
%cahps(
var = q11 q12 q14 q15, /*Name of variables in the composite
measure to be analyzed*/
vartype = 3, /*Set the type of variables: 3 =
“never” to “always” (1-4))*/
recode = 1, /*Recode the scale: 1 = 3-part
frequency where the 4-point scale is
collapsed into 1-2|3|4*/
name = Provider Communication
Composite measure, /*Label for the outcome variable*/
Instructions for Analyzing Data from CAHPS Surveys in SAS
33
adultkid = 3, /* Specify how to analyze child and
adult surveys: 3= analyze adult data
only */
Adjuster = age ghr, /*List of case-mix adjuster variables
to include*/
impute = 1, /*Flag to impute case-mix adjusters
that are missing*/
dataset = test, /*Name of the input data set*/
outname = ProvComm /*Name used for the output data set*/
proc_type = 1 /*Specifies the use of PROC SURVEYREG
for the case-mix model/
);
Step 3b. Post-Stratification Weighting Case
To combine data for reporting from different sampling groups, or strata, you must add a text file to the
program before the macro run. Some examples illustrate situations in which this feature might be used:
1. Two health plans are merged that were formerly separate and were treated as such in the survey.
The two former health plans are the strata, with each assigned a weight to combine for post-
stratification into a single health plan score.
2. A health system surveys patients in all their practice sites by urban or rural locations, but the
system wants to combine these urban and rural patients into their respective practice sites for
reporting.
If stratification is part of your survey design, you must create an ASCII data set with columns separated
by one or more spaces for these four variables:
Original Plan – a unique identifier of the entities or strata before they are combined. This
variable can be coded as alphanumeric, but it cannot exceed 16 characters. This variable is the
first column of the data table.
New Plan – identifier for the entities that will be created by combination of strata or post-
stratification. This variable can be coded as alphanumeric, but it cannot exceed 16 characters.
This variable is the second column of the data table. If no stratification is being done, this column
may look identical to the column for original plan.
Strata Weight – a numeric variable that indicates the size of the population for the entities or
strata. This variable is used to create the weights for combining the strata. This variable is the
third column of the data table. If no stratification is being done, this column may be set to 1s.
Subsetting Codeidentifier for the subset (i.e., region, state, county) in which the entity
belongs. This variable can be coded as alphanumeric. If no subsetting is to be done, this column
may be set to 1s.
The ASCII file for the PLAN details should not contain any missing data; each column of data should be
separated by spaces. If tabs are used, the Analysis Program may not read in the data correctly. Also, be
sure not to have any extra records at the bottom of the ASCII file. Importing an ASCII file into SAS is the
only way to add stratification information.
You may use the TEST_CAHPS50_STRATIFIED program as a starting point to create a PLAN detail
ASCII file and change variable names and paths as needed.
Instructions for Analyzing Data from CAHPS Surveys in SAS
34
An example of the PLAN detail data set is provided for the test program
(TEST_CAHPS50_STRATIFIED.SAS). The data file is called “plandtal.dat” and looks like the text
below:
Table A.1 Sample Data for Post-Stratification Weighting
Origplan (PLAN for
each strata)
Newplan (post-
stratification PLAN) stratwgt Subcode
PRACTICE_A_URBAN PRACTICE_A 5000 Northeast
PRACTICE_B_URBAN PRACTICE_B 8000 Northeast
PRACTICE_C_URBAN PRACTICE_C 15000 South
PRACTICE_B_RURAL PRACTICE_B 2000 Northeast
PRACTICE_C_RURAL PRACTICE_C 3000 South
The first column provides a unique identifier for each strata/plan (original plan).
The second column, the new plan variable, indicates which strata/plan will be combined for post-
stratification weighting.
The third column, the strata population size, is used to compute the weights for the strata. Strata
with greater population sizes receive more weight than smaller units in the combined strata.
The fourth column is the region (subset) of the country in which each strata/plan does business.
After the text file is created, use similar codes in Step 3a to run the macro. In the macro statement, you
must set WGTSTAT = 2 to use the post-stratification weighting. For more details, please refer to
examples of using these arguments with the test data set in TEST_CAHPS50_STRATIFIED.SAS.
Examples of using these parameter with the test data set are provided below.
* Executes CAHPS macro with global rating scale variable, case-mix adjusters, and using post-
stratification weighting.
%cahps(
var = q18, /*Name of the variable to be analyzed*/
vartype = 2, /*Set the type of variable: 2 = rating
scale(0-10)*/
name = Rating Provider, /*Label for the outcome variable*/
adjuster = q25 q23_re, /* List of case-mix adjuster variables to
include */
adultkid = 3, /*Specify how to analyze child and adult
surveys: 3 = analyze adult data only*/
adj_bars = 1, /*Flag for the frequencies to be case-mix
adjusted*/
bar_stat = 0, /*Flag if case-mix adjusted frequencies
should be saved*/
wgtdata = 2, /*Combine strata and conduct post-
stratification weighting */
subset = 3, /*Subset case-mix adjustment model for the
subset group (North and South) */
impute = 1, /*Flag if impute case-mix adjusters that are
missing*/
dataset = test2, /*Name of the input data set*/
outname = rprov /*Name used for the output data set*/
);
Instructions for Analyzing Data from CAHPS Surveys in SAS
35
Contents of the Test Data Set (TEST_CAHPS50_DATA)
Table A.2 lists the contents of the test data set (TEST_CAHPS50_DATA.sas7bdat).
Table A.2 Description of test data set variables based on CAHPS Clinician & Group
Adult Survey 3.0
Variable Description Response options and Formats
PlanID Plan identification number 1 = PRACTICE_A_URBAN ;
2 = PRACTICE_B_URBAN ;
3 = PRACTICE_C_URBAN ;
4 = PRACTICE_B_RURAL ;
5 = PRACTICE_C_RURAL ;
. = Missing
Q05 Last 6 months, make appointment for an
illness, injury with provider
1=Yes
2=No
.=Missing
98=Inapplicable
99=No Answer Given
Q06 Last 6 months, how often get
appointment for routine care as soon as
needed
1=Never
2=Sometimes
3=Usually
4=Always
.=Missing
98=Inapplicable
99=No Answer Given
Q11 Last 6 months, how often provider
explains things
1=Never
2=Sometimes
3=Usually
4=Always
.=Missing
98=Inapplicable
99=No Answer Given
Q12 Last 6 months, how often provider listens
carefully
1=Never
2=Sometimes
3=Usually
4=Always
.=Missing
98=Inapplicable
99=No Answer Given
Q14 Last 6 months, how often provider shows
respect
1=Never
2=Sometimes
3=Usually
4=Always
.=Missing
98=Inapplicable
99=No Answer Given
Instructions for Analyzing Data from CAHPS Surveys in SAS
36
Variable Description Response options and Formats
Q15 Last 6 months, how often provider spends
enough time with you
1=Never
2=Sometimes
3=Usually
4=Always
.=Missing
98=Inapplicable
99=No Answer Given
Q18 Last 6 months, rate provider 0 (worst) - 10 (best)
.=Missing
98=Inapplicable
99=No Answer Given
Q23 Rate overall general health 1=Excellent
2=Very Good
3=Good
4=Fair
5=Poor
.=Missing
98=Inapplicable
99=No Answer Given
Q24 Rate overall mental health 1=Excellent
2=Very Good
3=Good
4=Fair
5=Poor
.=Missing
98=Inapplicable
99=No Answer Given
Q25 Age 1=18 to 24
2=25 to 34
3=35 to 44
4=45 to 54
5=55 to 64
6=65 to 74
7=75 or older
.=Missing
98=Inapplicable
99=No Answer Given
Q26 Gender 1=Male
2=Female
.=Missing
98=Inapplicable
99=No Answer Given
Instructions for Analyzing Data from CAHPS Surveys in SAS
37
Variable Description Response options and Formats
Q27 Highest education level completed 1=<= 8grade
2=Some high school
3=High school grad/GED
4=Some college/2-yr degree
5=4-yr college grad
6=>4-yr college degree
.=Missing
98=Inapplicable
99=No Answer Given
Instructions for Analyzing Data from CAHPS Surveys in SAS
38
Appendix B. Statistical Explanation of Macro Parameters
This appendix contains detailed explanation of some of the macro parameters. It is divided into three sub-
sections:
Detailed Explanation of Analyses Performed in the CAHPS Analysis Program
Code Descriptions and Resources
Detailed Explanation with Statistical Notations (describes how some of the macro parameters are
implemented in the Analysis Program)
Detailed Explanation of Analyses Performed in the CAHPS Analysis Program
Case-mix Adjustment
Health status and age are two patient characteristics frequently found to be associated with patient reports
about the quality of their medical care. People in worse health tend to report more problems with care
than do people in better health. Older patients tend to report fewer problems with care than do younger
patients, although this association is usually not as strong as the one between health status and ratings.
Health status may be related to ratings of care because sicker persons are more likely to give negative
ratings in general (response tendency), because some people are likely to give negative ratings about
anything, including their health and the medical care they receive (correlated error), or because they get
worse care, (i.e., perhaps their greater needs create more opportunities for failure). The age association
has the same ambiguity. However, regardless of the reason, it is misleading to rate an entity worse simply
because of the kind of patients it treats.
In the Analysis Program, if data are missing for an adjuster variable, the program either (at the option of
the user) deletes the case or imputes the entity mean for that variable. The latter procedure avoids losing
observations because of missing data; it is acceptable in this setting because, typically, both the size of the
adjustment and the amount of missing data on adjusters are small.
Sometimes case-mix adjustments may be required for an entity, but for some reason it would not be
desirable for the ratings from that entity to affect the estimated case-mix coefficients or the recentering of
entity scores. An example in Medicare CAHPS would be where the purpose of the implementation is to
make comparisons among Medicare Advantage (MA) plans, but data were also collected for non-MA
plans and the survey user wants to include them for comparison without affecting the MA scores. A quick
way to implement case-mix adjustment in this instance is to use the case-weighting option. Data from the
entities designated not to affect the model are retained in the sample but assigned very small weights
(such as 0.0000001, or 0.0000001 times their sampling weights if the data are already weighted). The
case-mix model is then applied as usual, using the weights. This trick works because (1) the weights for
the designated entities are so small that the associated data have essentially no influence on the fitted
model and (2) case-mix adjustment is performed in full irrespective of the weights.
Case Weighting
Weighting arises at three points in the computations performed by the CAHPS Analysis Program: (1)
Estimation of case-mix regression coefficients, (2) Calculation of adjusted entity means, and (3)
Calculation of overall mean and significance tests of difference from the overall mean.
Instructions for Analyzing Data from CAHPS Surveys in SAS
39
(1) Estimation of case-mix regression coefficients. We may think of this calculation as proceeding in
two stages: first calculating sufficient statistics (the statistics for each entity used in calculating the
coefficients) for regressions within each entity, and then pooling these estimates across the entities,
weighting the sufficient statistics by the corresponding entity weight. The weighting issues in the first
stage concern the weights given individual cases in the sufficient statistics for the within-entity
regressions, and in the second stage concern the weights used when pooling the within-entity estimates
across entities. In general, the within-entity regression estimates will be biased and inconsistent if the
weights are related to residuals from the regression, so it is advisable to use the within-entity weights (if
they are available) unless it is known that the sampling was conducted in a way that does not create bias if
the weights are ignored.
There is more leeway in choice of weights at the entity level when pooling the within-entity estimates.
Weighting each entity’s statistics by the sum of the case weights of cases in an entity yields estimated
coefficients that are representative of the entire population, by weighting the data from each entity by the
total population of the entity. While population representativeness is a common objective for analysis of
surveys, it has some disadvantages in CAHPS surveys because CAHPS results are reported for entities
rather than the population as a whole. If a few entities have much larger populations than others, they
could dominate estimation of the coefficients in a weighted regression; this could be undesirable because
the objective of regression modeling in case-mix adjustment is to estimate a model that fits reasonably
well across all the entities being compared, not just the largest ones or the pooled population.
Furthermore, such disproportionate weighting is generally less efficient statistically than a weighting that
is more uniform, yielding larger variances for the same amount of data. This approach may nonetheless be
desirable if the primary goal of the analysis is to obtain nationally representative estimates, for example,
for national comparison of subgroups of patient that cut across entities, such as those in different regions
or racial/ethnic groups.
Another option is to weight each entity’s data equally; this can be implemented by dividing each case’s
weight by the total weight for the entity. This serves the objectives of CAHPS analyses where the primary
objective is to compare entities or to examine effects of entity-level factors, but may be inefficient if the
sample sizes per entity vary greatly, especially if some entities have very large samples.
A third weighting option weights each entity by its number of respondents (“precision weighting”); this
can be implemented by multiplying each case’s weight by the ratio of number of respondents to total
weight for the entity. Holding other things approximately equal across entities (such as the residual
variance and the within-entity distribution of characteristics), this is statistically the most efficient
method. In this option, entities with small samples do not gain disproportionate weight. While the largest
entity samples do gain more influence in the regression, in many CAHPS applications the sample sizes
are bounded by design (or by limited resources) so a large entity population does not translate into a
proportionately large entity influence in the regression. A possible disadvantage of this method is that it
depends on the sample design and response/nonresponse patterns, and therefore has no clear population
interpretation. Nonetheless, we recommend this as the default option because it is the most robust and
often most statistically efficient method.
The final calculation of case (individual) weights for the case-mix regressions can be understood as
consisting of three steps:
First, calculate within-entity weights that sum to 1 in each entity; these are equal to the weights
provided to the macro divided by the sum of the weights in each entity.
Instructions for Analyzing Data from CAHPS Surveys in SAS
40
Second, calculate entity weights using one of the options defined above.
Third, multiply the within-entity weights by the entity weights to get the weight used in
regression.
(2) Calculation of adjusted entity means: Because the entity means are calculated separately for each
entity, entity level weights are not relevant to this calculation. On the other hand, for the reasons
described above, the within-entity weighting is usually important to calculation of representative
estimates of entity means. Thus, we recommend that this calculation use any weights that vary across
cases within the same entity, and this is the only option in the Analysis Program.
(3) Calculation of overall mean and significance tests of difference from the overall mean: Since this
step operates only on the entity means, within-entity weight variation is not relevant here. The definition
of the overall mean affects both recentering of the entity means and significance tests of differences from
that mean. A complicating circumstance is that for a composite measure, the number of cases or total
weight may be different for each of the items going into composite measure. The weighting choices for
calculation of the overall mean are to
a. weight the entity means equally,
b. use weights equal to the sum of the weights for cases within each entity, which produces an
estimate of the combined mean of the entire population of cases, or
c. use weights equal to the number of observations used in the calculation of the mean (or the total
of these numbers across the items of a composite measure).
These options are parallel to the options for entity weighting of case-mix adjustment models, but the two
selections are independent.
The choice of method for calculating the overall mean affects the tests of each entity’s difference from
that mean. For example, if one entity has a much larger enrollment than the others, and also an unusually
high mean score, it will pull up the overall mean so it will become more difficult for an entity to
demonstrate significantly better performance than average. With equal weighting of entities, the
comparison is to the mean of entity means, which generally lies in the “middle” of the entity scores but is
not necessarily representative of the combined population of cases.
We recommend choosing between these options based on the interpretation that will be given to the
reported overall mean and therefore to the comparison of each entity’s adjusted mean to that overall
mean. The usual comparisons of entities for quality reporting, incentives, and similar purposes are
intended to place each entity in relation to the collection of entities; we recommend the unweighted mean
of entities (equivalent to equal entity level weights) as the appropriate standard of comparison.
Item Weighting – Algorithm for Composite Measures
The CAHPS Analysis Program uses item weights to compute the means of the composite measures for
each entity. There are three types of weights that users can select in the EVEN_WGT macro parameter.
To use the sum of the number of respondents for the item weights, select EVEN_WGT = 0. The
EVEN_WGT = 2 option uses the sum of the individual weights by each item for the item weight. For the
EVEN_WGT = 1 option, two methods are available for computation of the item weights. First, the item
weight equals one divided by the total number of items. So if equal weighting was chosen and there were
Instructions for Analyzing Data from CAHPS Surveys in SAS
41
four items in the composite measure, the item weight is 1/4 = 0.25 for each item. An advantage of this
approach is that the relative weights of the items in the composite measure are consistent among survey
administrations. Furthermore, survey users may regard each item as equally important even if some are
answered more frequently than others. A disadvantage of this option is a possible loss of statistical
precision if an item with few responses is combined, equally weighted, with an item with many responses.
Thus, the EVEN_WGT = 1 has an option that solves this problem through down-weighting of low-
response items.
Variance Estimation
Variances are calculated for the mean for each entity, conditional on the coefficients for the adjuster
variables. Conditionally these means are independent (ignoring the recentering constant that is added to
make the mean of the adjusted means equal to that of the unadjusted means for presentation purposes).
Conditioning on the regression coefficients is a standard procedure in variance estimation in the analysis
of surveys (see Cochran, Sampling Techniques, 1977, Chapter 7). It is not difficult to allow for the
covariance of the adjusted means due to uncertainty about the regression coefficients in the case of single-
item reports, but it is difficult to do this in a general way for the multi-item composite measures, when the
pattern of missing data varies by item. In the interest of consistency, we use the same procedure for both
classes of reports.
Code Description and Resources
Case Weighting
Table B.1 lists the types of weight options available in the CAHPS Analysis Program.
Table B.1 List of Macro Weight Options
Phase of
estimation Option Indications and Advantages
CAHPS Analysis
Program options
Regression
for case-mix
coefficients:
within-entity
weighting
Use weights Generally recommended; makes
estimates more population-
representative and reduces chance of
bias due to association of weights with
outcomes
Only option allowed if
weights are provided
(wgtresp)
Ignore weights Use when inefficiency of estimation
with unequally-weighted data is a
problem; consider possible biases
first.
Do not provide weights
Regression
for case-mix
coefficients:
entity-level
weighting
Population:
sum of case
weights
When population-weighted regression
coefficients are of interest
(wgtresp and wt_type = 2)
Equal by entity When primary objective is comparison
among entities of equal importance
(wgtresp and wt_type = 1)
Precision: by
number of
respondents
Maintain statistical efficiency
(precision of coefficients) when
responding sample sizes for entities
vary greatly
(wgtresp and wt_type = 0)
Instructions for Analyzing Data from CAHPS Surveys in SAS
42
Phase of
estimation Option Indications and Advantages
CAHPS Analysis
Program options
Calculation of
entity means:
within-entity
weighting
Use individual-
level weights
Generally recommended. Only option allowed if
weights are provided
(wgtmean)
Ignore weights Only if weights are known to be
irrelevant.
Do not provide weights
Recentering
and tests of
adjusted
means: entity-
level
weighting
Preserve and
test against
unweighted
mean of means
Relevant when entities are treated as
equal members of population of
entities (as in comparisons for quality
reporting or incentives)
Unweighted option
(overall_wt = 1)
Population:
sum of case
weights
Relevant when testing the entity mean
against the population mean is desired.
Population weight option
(overall_wt = 2)
Use number of
respondents
This is the most efficient weight. It
can be used when there is no
information about the population.
Number of respondents
option (overall_wt = 0)
Table B.2 shows how each type of weight can be calculated.
Table B.2 Case Weighting Used for Case-mix Coefficients
Entity
Survey
weight
Within-entity
weight=
(Survey
weight)/
(Entity total)
Weight options / entity total / derived case
estimation weights
Population
weighting of
entities
Equal
weighting of
entities
Precision
weighting of
entities
Entity total =
sum of survey
weights Entity total=1
Entity total
=number of
respondents
Hxxxx 30 30/180=.1667 30 0.1667 0.833
Hxxxx 30 30/180=.1667 30 0.1667 0.833
Hxxxx 40 40/180=.2222 40 0.2222 1.111
Hxxxx 40 40/180=.2222 40 0.2222 1.111
Hxxxx 40 40/180=.2222 40 0.2222 1.111
Entity Hxxxx total
and
Weight in
estimation
180 1 180 1 5
Hyyyy 30 30/80=.3750 30 0.375 1.125
Hyyyy 30 30/80=.3750 30 0.375 1.125
Hyyyy 20 20/80=.2500 20 0.25 0.75
Entity Hxxxx total
and
Weight in
estimation
80 1 80 1 3
Instructions for Analyzing Data from CAHPS Surveys in SAS
43
Explanation of weight calculation:
1. Survey weight is entered by the user as a variable in the input data set. It may incorporate
sampling, nonresponse, and/or post-stratification weights. If no weights will be used, this is set to
1 for every responding case. The macro calculates the sum of each respondent’s weight for each
entity.
2. Within-entity weight is the fraction of the entity’s total weight that is assigned to a specific case,
defined as the survey weight of the observation divided by the entity sum of these weights.
3. The total estimation weight in each entity is determined by the entity weighting option chosen
(gray cells).
a. For each entity, this weight is allocated to the observations in the entity in proportion
to its within-entity weight. For population weighting, this recovers the original survey
weights.
b. Because the number of responses may vary across items, these calculations are
repeated for each item.
Item Weighting – Algorithm for Composite Measures
When each item weight is assigned equally by selecting the equal weight option (EVEN_WGT = 1) to
calculate the composite measure mean, a problem may arise if some of the items have low responses. To
solve this problem, the even weight option has a method to assign the item weight by downweighting
low-response items.
The first modification is motivated by the fact that responses to different items in the same composite
measure often have different mean values for a variety of reasons, including how frequently problems
arise in different kinds of interactions and services and how the questions are worded. If the items are
weighted the same way for every entity to calculate the composite measure, the effect of these unequal
means across entities is minimal. However, if the items are not weighted equally, this could give rise to
variations unrelated to variations in quality.
Thus, we first modify the calculation of weighted composite measures to minimize the impact of such
differences in item means on expected scores. To explain the need for this modification, suppose y
i
is the
mean score for item i at a given entity, and
i
is the mean score for item i across all entities. With weights
w
i
that sum to 1, the composite measure score is
ii
i
wy
for a specific plan, and if that plan is at the
average on all measures, its score is
ii
i
w
. If the overall means
i
differ, this last expression will
depend on w
i
; in other words, even two plans that are average on every measure will receive different
composite measure scores if the composite measures are calculated with different weights.
To remove this dependence, we center the scores at their means before combining them. Suppose now
that w
i
represents the weight for item i at a particular entity, and w
0i
represents some standard weights
common to the entire report. Now define a composite measure score as
0
()
i i i i i
ii
w y w

−+

.
Any entity that is average (y
i
=
i
) on every item will receive the same composite measure score
0ii
i
w
Instructions for Analyzing Data from CAHPS Surveys in SAS
44
regardless of the weights w
i
, so bias due purely to weighting is removed even if different entities are
scored with different weights. Note that the second term of this composite measure score expression is the
same for every entity; it is included only to bring the average back to an interpretable level as an average
score of overall means.
Given this modification, we can now consider modifying item weights for different entities. The main
requirement is that the weight must be zero (w
i
=0) when there are no responses for item i; we also want
the weights to be equal (or at least to approach equality) when there is “adequate” sample for every item.
One simple weighting mechanism meeting these requirements is as follows:
Set w
i0
=1/I, i=1, …, I, where I is the number of items in the composite measure.
Choose a cutoff number of observations K; weights will not be modified for items with at least K
observations.
Define entity-specific weights
'
' 1,...,
min( , ) min( , )
i i i
iI
w n K n K
=
=
, where n
i
is the number of
responses from the entity for item I, and
is the lesser of n
i
and K.
Calculate composite measure scores as described above.
This procedure has the following desirable properties:
For each entity, all items with at least K responses are given equal weight. Consequently, there is
no modification to equal item weighting for entities with large samples.
Items with no responses in a given entity are given no weight, so the composite measure score
can still be calculated.
Items with low numbers of responses (<K) are given reduced weight so their effect on variance is
mitigated.
The criterion for determining whether an item will be downweighted is very simple to describe.
The procedure can easily be modified for unequal baseline weights w
i0
.
Table B.3 illustrates the calculation of item weights for various scenarios in a composite measure with
three items, assuming that the target minimum sample size K=20.
Instructions for Analyzing Data from CAHPS Surveys in SAS
45
Table B.3 Examples of a composite measure with three items using a macro
parameter K
Sample
sizes n
i
min(n
i
,K)
Calculation of
weights w
i
Weights w
i
simplified Interpretation
60, 70, 80 20, 20, 20 20/60, 20/60, 20/60 1/3, 1/3, 1/3 Every item has adequate sample
so equal weighting is OK.
0, 22, 24 0, 20, 20 0/40, 20/40, 20/40 0, 1/2, 1/2 Item with no responses gets no
weight.
10, 22, 34 10, 20, 20 10/50, 20/50, 20/50 1/5, 2/5, 2/5 One item has low response and
is downweighted.
2, 3, 5 2, 3, 5 2/10, 3/10, 5/10 2/10, 3/10, 5/10 If all samples are small, weight
each item proportional to the
number of responses to improve
the efficiency of estimation.
Table B.4 illustrates the calculation of the “centered” weighted average in an entity in which one item of
the composite measure has few responses (third line of table above), again assuming K=20.
Table B.4 Examples of composite measure with three items using a macro
parameter K and means
Description Symbol Item 1 Item 2 Item 3
Baseline equal weighting w
i0
1/3 1/3 1/3
Overall (all entities) mean
i
3.45 2.75 2.65
Mean in a specific entity y
i
3.55 2.80 2.75
Sample sizes in that
entity
n
i
10 22 34
Weights in that entity w
i
1/5 2/5 2/5
Centered entity means y
i
i
0.10 0.05 0.10
The baseline weighting is assumed to be equal for the three items. Thus, the overall mean composite
measure score is (3.45+2.75+2.65)/3 = 2.95.
Because at the specific entity of interest there are only 10 responses for Item 1, it is given half the weight
of each of the other items. The weighted mean for the entity is then
(1/5)3.55 + (2/5)2.80 + (2/5)2.75 = 2.93. Note that this is below the overall mean composite measure
score, despite the fact that the entity is above the mean on each item, because the item that generally has a
high score is downweighted.
To calculate the score by the proposed method, we first calculate the centered means (last line of table),
which are all positive. Their weighted mean is (1/5)0.10 + (2/5)0.05 + (2/5)0.10 = 0.08. We then add
this mean deviation from mean and add it to the overall mean, 0.08 + 2.95=3.03, which is the reported
score. This correctly reflects the superiority of this entity across all the items.
Instructions for Analyzing Data from CAHPS Surveys in SAS
46
Detailed Explanation with Statistical Notations
Case-mix adjustment
Let
ipj
y
represent the response to item i of respondent j from entity p (after recoding, if any, has been
performed). The model for adjustment of a single item i is of the form;
ipjipipjiipj
xy
++
=
where
i
is a regression coefficient vector,
ipj
x
is a covariate vector consisting of two or five adjuster
covariates (as described above),
ip
is an intercept parameter for entity p, and
ipj
is the error term. The
estimates are given by the following equation:
where
( )
=
ipiii
,,
21
is the vector of intercepts,
i
y
is the vector of responses and the covariate
matrix is
( )
pa
uuu
21
XX =
where the columns of
a
X
are the vectors of values of each of the adjuster covariates, and
p
u
is a vector
of indicators for membership in entity p, p = 1, 2,…P, with entries equal to 1 for respondents in entity p
and 0 for others.
Finally, the estimated intercepts are shifted by a constant amount to force their mean to equal the mean of
the unadjusted entity means
ip
y
(to make it easier to compare adjusted and unadjusted means), giving
adjusted entity means





,
where
is the sum of the response weights for each entity p and

is the adjusted entity mean before
recentering.
For single-item responses, these adjusted means are reported. For composite measures, the several
adjusted entity means are combined with equal item weights (one divided by the number of items as
default), that is, by calculating the mean across items.
Variance of difference from national mean
We first calculate residuals from the regression model for every item response,
pjiipjipj
xyz
=
where
i
is the regression coefficient vector for item i and
ipj
y
is the response to item i from person j in
entity p. The adjusted mean
ip
ˆ
for entity p, item i, is the mean (across nonmissing observations) of


is defined by the weight for item i from person j in entity p. If we replace


with 0 for all
Instructions for Analyzing Data from CAHPS Surveys in SAS
47
missing responses and define
1=
ipj
r
if there is a nonmissing response and 0 otherwise, then we can write
this as






and the composite measure score for the entity is





where
is the composite measure item weight for item i . Linearizing this expression by taking
derivatives with respect to each of the sums
j
ipj
z
and
j
ipj
r
, we obtain the following approximation:


 



where



is the number of responses to item i from entity p,
pj
d
is defined by the
summand, and
ip
m
is the weighted mean of
ipj
z
for the item i in entity p. We now apply the standard
formula for the variance of an estimated sum,
( ) ( )( )
==
j
pjppp
^
p
dnnVarV
2
1
ˆ
ˆ
where
p
n
is the number of respondents from entity p. This gives an estimate of a variance of the
composite measure score for entity p. If the composite measure consists of a single item, or if there is no
item nonresponse, these results correspond to the standard variance formula.
Note that we do not apply any finite population corrections in this variance calculation. The finite
population correction is appropriate if the object of our inference is the mean rating from the population
of members or patients who are in entity p at the present time. Our concern, however, is with predicting
the mean rating that would represent the experiences of a new set of subscribers or patients joining or
seeking care at the entity, because we are attempting to give guidance to those who are considering anew
their choice of insurance or treatment site. Conceptually, we regard the present members as a sample from
a super-population of potential users of the entity.
Global F-test
The weighted grand mean is calculated as

 



where
is the weight from entity p. Then the F-statistic is calculated as
Instructions for Analyzing Data from CAHPS Surveys in SAS
48
( )( )
( )
=
p
pp
VPF
ˆ
ˆˆ
11
2
This statistic has an approximate F distribution with (P-1, q) degrees of freedom; we have found in
simulations that q = n/P (the average sample size per entity) makes the F-test at worst slightly
conservative with typical sample sizes and response distributions. In other words, reported p-values from
the test are slightly larger than they should be, so significant differences are less likely to be declared.
T-tests for entity differences from mean
We compare each entity mean to the mean of the entity means using a t-test. The corresponding
contrast is
(
)
(
)
(
)
(
)
=
=
*
ˆ
1
ˆ
1
ˆ
1
ˆ
p
p
p
p
p
p
p
P
P
P
P
where
*
represents a sum over all entities except entity p. Note that the last expression is simply (P-1)/P
times the difference of
p
ˆ
from the mean of all entities except entity p; therefore, the two formulations
(mean vs. mean of all, or mean vs. mean of all others) are equivalent. The variance of
p
is
( )
( )
+=
p
ppp
VPVPPV
ˆ
1
ˆ
1
ˆ
22
and the t-statistic is calculated as
( )
2
1
ˆ
pp
V
, and referred to a t distribution with
( )
1
p
n
degrees of
freedom, which again is usually slightly conservative.
Instructions for Analyzing Data from CAHPS Surveys in SAS
49
Appendix C. Summary of Features Included in Each Version of
the CAHPS Analysis Program
Version 1.0 of the CAHPS SAS Analysis Program offered the following features:
An assessment of significance using practical and statistical (p-value) criteria;
An option to analyze data based on outpatient utilization groupings;
An option to analyze child and adult data together or separately;
Comparisons of health plan performance; and
Case-mix adjustments.
Version 1.5 of the CAHPS SAS Analysis Program added the following enhancements:
Weighting and stratification. The SAS program performs the correct analyses for
disproportionate stratified sampling designs. One way such designs might appear is when two
plans that were surveyed separately have subsequently merged their operations into a single
business entity, and their results will be reported as a single plan. They also may appear when the
sponsor decides to collect additional surveys by using larger sample sizes for a certain subset of
people (based on geographic area, gender, age groups, etc.) beyond what would appear there by
proportionate allocation. To use this feature, the user must specify which strata are combined and
the number of members in each stratum out of the entire population (the weights).
Plan name flexibility. Plan identifiers for programming and output purposes are no longer
required to be numeric. Text or numeric names are allowed to facilitate programming and
interpretation of results.
Case-mix adjusters. The program no longer requires two case-mix adjusters (age and health
status) to be used in the analyses. The user can now specify an unlimited number of adjuster
variables or choose not to adjust the data.
Substantive differences. A new method of specifying an absolute difference that must be
achieved before a difference is meaningful has been added to the program. While the previous
method of determining a meaningful difference is still available, the user can now simply choose
an absolute difference that must exist between means for a difference to be flagged as significant.
Results tables. Version 1.5 has an additional feature that creates SAS data sets of the results
tables the program produces. This allows users to perform additional analyses on the aggregate
results or to create summary reports. Linear regression coefficients for the adjuster variables are
now output as part of the results tables and reports.
Missing data for adjusters. In the initial version of the Analysis Program, missing data for the
case-mix adjustment variables was imputed at each item’s health plan mean. Version 1.5 allows
the user to specify whether or not the analysis is conducted with imputation for the adjuster
variables.
Instructions for Analyzing Data from CAHPS Surveys in SAS
50
Version 2.0 and 2.1 of the CAHPS SAS Analysis Program added the following enhancements and
changes:
The SAS code has been converted to require only Base SAS and the SAS/STAT module,
eliminating the need for SAS/IML. If adjuster variables are excluded, then the REG procedure in
the SAS/STAT module is not needed. The code has been modularized into macros to aid in
maintaining the macro and understanding what the macro is doing.
The macro now has two additional ways in which to subset the data being run through the
Analysis Program without having to create separate calls of the Analysis Program. With
SUBSET = 2, the Analysis Program runs the case-mix model on the entire data set but does the
plan/entity comparisons at the subset levels specified in the fourth column of the plan detail file
created by the user. With SUBSET = 3, the Analysis Program does both the case-mix and the
plan/entity comparisons at the subset levels.
Data sets are now created for the output of the case-mix and hypothesis test calculations. This
allows for easy export to Excel or other programs for report generation.
The composite measures are no longer restricted to the “How Often” (1-4) question
responses. The variable type is indicated in the macro call and the macro runs a composite
measure calculation if the number of variables is greater than one. This change was made to
accommodate the need to create composite measures from questions with dichotomous and
trichotomous variables. The program can now create composite measures using all variable types
used in the survey
The weighting of the composite measure items now has the option of doing equal weighting
across items as well as weighting based on the number of responses in each item divided by the
total number of responses in all items. The default option for the macro is to use the equal
weighting.
An option is available for recoding the global rating scales from 0 – 10 to 1 – 3 and the “How
Often” scales from 1 – 4 to 1 – 3 using the new parameter RECODE. The primary rationale for
the recoding into three categories is to make the data entering into the hypothesis tests entirely
consistent with the information presented in the “Bar Graph” reports.
A secondary rationale for recoding is that it may improve the statistical properties of the tests. On
general statistical principles, it would not be surprising if the analysis of very skewed data were
improved by a transformation that reduced the skewness. In the CAHPS survey, it is plausible
that the difference between 0 and 2, both indicating strong dissatisfaction, carries with it less
information than the difference between 8 and 10, reflecting average and maximum satisfaction,
respectively. Therefore, combining categories at the low end of the scale may remove some
meaningless variation from the data. Statistical improvement would be reflected in larger values
of the F-statistic in the recoded data compared to the original data.
The recoding is defined as:
Instructions for Analyzing Data from CAHPS Surveys in SAS
51
Rating scale How often scale
Response value Recode Response value Recode
Option 1:
0 – 6 1 1 – 2 1
7 – 8 2 3 2
9 10 3 4 3
Option 2:
0 – 7 1
8 – 9 2
10 3
A new parameter, KP_RESID, has been added to the macro call to allow the residual values
from the regression to be saved as a permanent SAS data set. By default, these values are only
saved temporarily while the macro is running.
Version 3.0-3.3 of the CAHPS SAS Analysis Program added the following enhancements and changes:
The plan detail file, plandtal.dat, and the filename statement that assigns PLAN_DAT are
optional. If the plan detail file does not exist, then the macro uses the PLAN variable in the data
set called by the CAHPS macro. If used, the plan detail file must have a unique record for each
plan name or code. Only the first column is required; if the second column is missing, then the
macro creates dummy values for the new plan name equivalent to the first column. If the third
and fourth columns have missing values, then they are all set to the value of 1. Each column must
be separated by spaces.
The Analysis Program removes any plans that are to be analyzed that have only zero or one
usable records. These changes were made in the submacro USABLE. The plans that are dropped
by the macro are saved in a permanent SAS data set labeled dp&outname.
The CHILD variable is optional. If it does not exist, then the macro creates the variable CHILD.
If the ADULTKID parameter is set to 2, then the macro assumes all records in the analysis data
set are child records and sets CHILD = 1, otherwise CHILD will be set to 0, indicating there are
no child records. If there is a mix of child and adult records in the data set, the user must set up a
variable named CHILD and set it equal to 1 for child records and some other value, usually 0 for
adult records. Version 3.3 of the CAHPS macro corrects a logic error found in version 3.2 of the
macro.
The EVEN_WGT parameter can apply individual level weights to the composite measure items.
This third option is activated by setting EVEN_WGT=2 and uses the weight variable, referenced
by WGTRESP.
The variance of the mean variable, vp, was added to the text output of the adjusted mean report.
A CAHPS version label was added to the permanent data sets to indicate which version of the
CAHPS Analysis Program created the data set. The version number was also added to the text
output.
Users can case-mix the triple-stacked bar frequencies, using the ADJ_BARS parameter, and
include both the non-case-mixed frequencies with the case-mixed frequencies in the final
frequency output data set, n_*. For variables of type 5 (vartype = 5), these cannot have case-
Instructions for Analyzing Data from CAHPS Surveys in SAS
52
mixed bars since the frequencies for the response values are not aggregated into three bars. To
make this work for nonstandard variable types, it is best to do some recoding first to make the
three desired ranges and then run the new variable through as a vartype = 4.
The following parameters were added:
The parameter ID_RESP stores the original respondent ID value, if one exists, in the permanent
data sets. If there is a unique variable in the data set that identifies each respondent, then enter the
variable name in this parameter. The macro carries it through the individual data sets and attaches
it to the residual data set if KP_RESID = 1 so the data set can be easily linked to the original if
needed. If no ID variable is entered, then the ID_RESP variable in the macro is set to ‘.z’. The
variable will be a character and have a maximum of 50 characters.
The parameter flag OUTREGRE indicates whether or not the regression output should appear in
the text output file. If set to 0, the default, then the SAS printed output from the regressions in the
case-mix procedure is not printed out into the output file. If set to 1, then the regression output
appears.
The parameter WGTRESP accepts the variable name that contains the weights for individual
respondents. This weight is used in the case-mix adjustment regression procedure.
The parameter WGTMEAN accepts a variable that contains the weights to be applied to the
means of the plans before the case-mix adjustments are applied.
The parameter SPLITFLG allows the data set to be split into two groups for the purpose of
centering the means differently and running two case-mix models through the macro. This was
done to deal with the Medicare Managed Care and Fee-for-Service analysis. By default, the
parameter is 0 and is not used but, if set to 1, then the data set must contain a variable with the
name SPLIT and must have the values of 0 and 1. Any record with a missing value is dropped
from the analysis.
The parameter BAR_STAT stores the results of the case-mixed bars in permanent data sets with
the same format as the case-mixed survey question results. The new data sets created have the
format B#&outname and F#&outname where the B* files hold the stars and statistics by plan and
the F* files hold the overall means and statistics. The # has the values 1-3 for a normal macro run,
where 1 = the first bar frequency, 2 = the second bar frequency, and 3 = the third bar frequency if
it is not dichotomous. &outname is the value given in the macro call parameter OUTNAME. If
the data are stratified and stratification weights are used by having the macro parameter
WGTDATA = 2, up to six additional files are created with # having the values A-C, where A =
the first bar frequency of the combined strata, B = the second bar frequency of the combined
strata, and C = the third bar frequency of the combined strata.
Version 3.3 corrects a logic error, contained within version 3.2, that occurred when the parameter
SUBSET = 3, which runs the macro multiple times based on the subsetting variable in the plan
detail file referenced by the FILENAME PLANDTAL statement.
The text output on the Warnings and Parameter Info page contains more accurate information
about the adjusters when there are child interactions, when ADULTKID = 1. The number of
adjusters will reflect the original adjuster variables times 2 plus 1, so if there are originally 2
adjusters, the total number of adjusters with child interactions will be 5, ADJ#1, ADJ#2, ADJ#1 *
CHILD, ADJ#2 * CHILD, and CHILD.
Instructions for Analyzing Data from CAHPS Surveys in SAS
53
Two flag lines added to the log file indicate if the macro finds the CHILD and PLAN variables in
the original analysis data set. If there is no child variable, the flag indicates how the macro
created a new CHILD variable.
Version 3.4 (May-June 2003) of the CAHPS SAS Analysis Program added the following enhancements
and changes:
Added three additional variables to the sa* data set and the output text of the statistical tests. The
unweighted, unadjusted plan mean was added to help clarify what the unadjusted mean actually
is. Only when the wgtmean parameter is used will the unweighted, unadjusted mean be different
from the weighted unadjusted mean. The other variable added is the 95% Confidence Limits for
the Difference of the Mean. This is computed as 1.96 * the standard error of the difference. When
wgtplan = 1, then a third column containing the summed weights for each plan will also be added
to the sa* data set, the b* data set if frequency bars are to be stored (bar_stat = 1) and the output
text.
Added in the weighted, unadjusted frequencies to the frequency table n_* data set and the output
text, when the frequency bars are also case-mix adjusted.
Expanded the purpose of the wgtmean parameter to allow the use of the sum of the weights to the
plan level to be used in the comparison of the plan means. If a variable exists for the wgtmean
parameter, then the individual record level weight is used to compute the weighted, unadjusted
plan means. In addition, if the new parameter wgtplan = 1, then the sum of the individual weights
to the plan level will be used in weighting the plan mean comparisons. The wgtplan parameter
can have the value of 0, default, or 1. When 0, the macro will use equal weights when comparing
the plan means. When 1, and the wgtmean parameter has a variable listed, then the sum of the
weights to the plan level will be used computing the overall and grand means which are used in
the statistical comparisons of the plan means.
Added checks on the DATASET parameter to make sure it exists or that the value in the
DATASET parameter is a valid SAS data set. If there is an error, the macro will stop processing
and print an error message to the log file.
Added error checking on the merging of the plan detail file with the analysis data set. If there are
no records matching, then the macro will print out the frequencies of the unique PLAN values for
both the plan detail file and the analysis data set to the output text file and also print out and error
to the log file.
Version 3.5 (September 2005) of the CAHPS SAS Analysis Program added the following enhancements
and changes:
A disclaimer and copyright statement were added.
If weights are being used for the individual or plan means, records with weights that are less than
zero or missing are removed.
When macro converts the numeric plan in allcases to character, it left justifies and trims trailing
blanks.
The macro checks that there are plans in all subcodes after the usable data set is made. If some
subcodes have all missing plans, it recomputes how the subcodes are used in the looping in the
star macro.
Instructions for Analyzing Data from CAHPS Surveys in SAS
54
The log comment for when child variable is not found in the original data set was changed.
A bug was identified in the CAHPS 3.4b macro: Two lines that have length planname $ 20 when
it should be $ 40 causing a merge problem with the N_* data sets. $ 20 was changed to $ 40.
Version 3.6 (April 2006) of the CAHPS SAS Analysis Program added the following enhancements and
changes:
This new version corrects an error in some previous versions affecting calculation of the
variances for the comparison of an entity mean to the mean of all other plan means, when the
plans were weighted. This error only affects analyses with parameter wgtplan=1 using CAHPS
macro versions 3.4b (released May 2003) and 3.5 (released September 2005). By default, the
macro sets wgtplan=0 so the error does not affect unweighted plan analysis.
The error caused significance tests to be calculated incorrectly when determining whether an
entity's mean was significantly above or below the average. This could cause some plans to be
declared 1- or 3-star plans when they were respectively below or above average, but not by a
statistically significant amount.
(July 2006) Modified formula for special case of using only one plan or entity and a division by
zero error may occur. This case used to work in prior versions. Modified code for checking if SE
may be missing to set T=0 in that case. Also, VO can now have a zero denominator, in the case
where there is only one entity being analyzed, modified code to catch that error.
(3.6b as of June 2007) This modification to Version 3.6 puts the _wgtmean variable in the strata
data step in order to address a problem with a missing line that was not keeping the _WGTPLAN
variable in the data step that created wstemp. Because of the missing line, the use of wgtdata=2
for combining strata generated a SAS error.
Version 4.0 (September 2011) and 4.1 (April 2012) of the CAHPS SAS Analysis Program added the
following enhancements and changes:
One part of the code that creates plandtal data set (it is in usable macro program) was modified.
This only affects when subset = 3.
The calculation of weights for the composite measure items was modified. The sum of weights
based on the number of responses from each item is used as the weight of the composite measure
case. Also, the calculation of item weights for even_wgt = 1 was modified. For more details about
how the weights are computed, please see the Explanation of Statistical Calculation section.
A new warning note was added in the macro output (it is in . mkreport macro program). The note
lists plan IDs when they have zero responses in measured items. A new option of assigning
smoothing variances was added. Users can assign a weight parameter called smoothing on the
variances as option. The default is smoothing = 0. This provides the original variances. If
smoothing is greater than zero, the value that users input will be used as the weight for the
variances. If smoothing is less than zero, the weight will be computed inside of the macro
automatically. For more details about how that weight is computed inside of the macro, please see
the Explanation of Statistical Calculation section.
A SAS procedure PROC STANDARD was replaced with PROC STDIZE. The macro centers all
adjusters before it runs regression procedure if adjusters are required. PROC STANDARD was
Instructions for Analyzing Data from CAHPS Surveys in SAS
55
not applicable when some adjusters contain only the same values. As a result, it did not
standardize the value correctly. PROC STDIZE is able to handle the situation.
(April 2012) Modified codes for computing adjusted composite measure means when composite
measure even weight option (even_wgt = 1) is selected. The macro computes the weights for all
entities regardless of the sample size. In the prior version, this caused incorrect adjusted means
when some entities did not make it to the final analysis due to the sample size. Thereby the
weights can be assigned differently in each item depending on the value K. Users can assign the
least responses in each composite measure item called K. . Version 4.1 is able to handle the case
and provide appropriate adjusted means.
Version 5.0 (November 2016) of the CAHPS SAS Analysis Program added the following enhancements
and changes:
(November 2016) Updated weighted variance estimation. This update to the CAHPS macro
corrects an error in the previous version, which failed to take differential weighting at the
individual level into account in variance estimation. Mean scores were not affected by this
update.
(February 2017) Added new weight option for calculating of case-mix regression coefficients
(WT_TYPE). One option (WT_TYPE = 1) is to weight equally by entity. The other option
(WT_TYPE = 2) is used when population-weighted regression coefficients are of interest. The
default (WT_TYPE = 0) is to weight by number of respondents.
(February 2017) Added weighted overall mean option (OVERALL_WT). OVERALL_WT = 0
uses number of respondents, OVERALL_WT = 1 assigns equal weight, and OVERALL_WT = 2
uses the plan weight assigned in &WGTPLAN, which is the default in the program.
(February 2017) Modified the default value of suppressing results of regression models
(OUTREGRE). The default value is updated to be 1 instead of 0.
(February 2017) Added VARDEF option to PROC MEANS. This affects the calculation of
weighted standard deviation.
(February 2017) Added PROC SURVEYREG as one of the regression options (PROC_TYPE).
Users can select either PROC REG or PROC SURVEYREG. The SURVEYREG procedure is
designed to handle complex survey sample designs, and one of the designs, clustering option, was
added to the version 5.0. Selecting PROC_TYPE = 1 performs PROC SURVEYREG. The
default is PROC_TYPE = 0, which is PROC REG used for the regression.
(February 2017) Updated the content of the case-mix regression coefficients output file
(C_&OUTNAME). The output contains the standard errors, the p-values for each of the case-mix
estimates.
(February 2018) Modified the composite measure calculation. Each overall mean of the items
gets computed first before combining the composite measure mean.
(January 2019) Modified the Test SAS data set to change from the Health Plan Survey version
4.0 to the Clinician & Group Survey version 3.0.
(June 2020) Modified all test programs to correspond with the updated Test SAS data, and
clarified instructions. Numbered programs to delineate which should be run first and separated
the steps required to prepare the data for analysis from the macro call statements.