Instructions for Analyzing Data from CAHPS® Surveys in SAS

Instructions for Analyzing Data

from

CAHPS

Surveys in SAS

Using the CAHPS Analysis Program Version 5.0

AHRQ Contract No.: HHSP233201500026I/HHSP23337004T

Managed and prepared by:

Westat, Rockville, MD

Naomi Yount, Ph.D.

Kayo Walsh, M.S. (Harvard Medical School)

Alan Zaslavsky, Ph.D. (Harvard Medical School)

Edited by Lise Rybowski, MBA

AHRQ Publication No. 20-M019

August 2020

Updated 8/2020

Public Domain Notice. This product is in the public domain and may be used and reprinted without

permission in the United States for noncommercial purposes, unless materials are clearly noted as

copyrighted in the document. No one may reproduce copyrighted materials without the permission of the

this product. Anyone wanting to reproduce this product for sale must contact AHRQ for permission.

CAHPS

is a trademark of AHRQ.

Suggested Citation:

Yount, N., Walsh, K., & Zaslavsky, A. Instructions for Analyzing Data from CAHPS

Surveys in SAS

Using the CAHPS Analysis Program Version 5.0, (Prepared by Westat, Rockville, MD, under Contract

No. HHSP233201500026I). Rockville, MD: Agency for Healthcare Research and Quality; August 2020.

AHRQ Publication No. 20-M019.

The authors of this report are responsible for its content. Statements in the report should not be

construed as endorsement by the Agency for Healthcare Research and Quality or the U.S.

Department of Health and Human Services.

No investigators have any affiliations or financial involvement (e.g., employment, consultancies,

honoraria, stock options, expert testimony, grants or patents received or pending, or royalties) that

conflict with material presented in this report.

Updated 8/2020

Contents

1. Introduction ........................................................................................................................................... 1

2. What Does the CAHPS Analysis Program Do? ................................................................................... 1

3. What Is Included in the CAHPS Analysis Program? ............................................................................ 2

4. Pre-Analysis Decisions ......................................................................................................................... 3

5. Software Requirement and Data File Specifications ............................................................................ 5

6. CAHPS Macro Parameters and Call Statements .................................................................................. 8

7. SAS Data Sets Generated by the CAHPS Analysis Program ............................................................. 17

Appendices

Appendix A. Using the Test Data in the CAHPS Analysis Program .......................................................... 30

Appendix B. Statistical Explanation of Macro Parameters ......................................................................... 38

Appendix C. Summary of Features Included in Each Version of the CAHPS Analysis Program ............. 49

Instructions for Analyzing Data from CAHPS Surveys in SAS

Updated 8/2020 iv

Tables

Table 5.1 Yes/No Variables ................................................................................................................ 6

Table 5.2 Three Response Variables ................................................................................................... 6

Table 5.3 Four-Point Frequency Scale Variables ................................................................................ 7

Table 5.4 Example of Creating Dummy Variables – What is your age? ............................................ 7

Table 6.1 Required Parameters for the CAHPS Macro 5.0 ................................................................. 8

Table 6.2 Optional Parameters for CAHPS Macro 5.0 ....................................................................... 9

Table 6.3 Original Data and Data Used for Analysis ........................................................................ 15

Table 7.1 SAS Data Sets Output for All CAHPS Macro Calls ......................................................... 17

Table 7.2 Additional SAS Data Sets Output from Case-Mix Adjustment ........................................ 17

Table 7.3 Additional SAS Data Sets Output from Saving Case-Mix Adjusted Frequencies ............ 18

Table 7.4 Additional SAS Data Sets Output from Post-Stratification Weighting Option ................ 19

Table 7.5 Additional SAS Data Sets Output from Saving Case-Mix Adjusted Frequencies

for Post-Stratification Weighting Option .......................................................................... 19

Table 7.6 Contents of DP&OUTNAME: Plans Dropped ................................................................. 20

Table 7.7 Contents of LR&OUTNAME: Lists Plans with 100 or Fewer Records ........................... 20

Table 7.8 Contents of N_&OUTNAME: Response Option Percentages .......................................... 21

Table 7.9 Contents of OA&OUTNAME: Results from Test for Significant Differences ................ 22

Table 7.10 Contents of P_&OUTNAME: Percent Missing Data ....................................................... 23

Table 7.11 Contents of SA&OUTNAME: Plan Level Results ........................................................... 23

Table 7.12 Contents of C_&OUTNAME: Case-mix Adjustment Regression Coefficients ............... 25

Table 7.13 Contents of R2&OUTNAME – R-Squared Results for the Case-mix Adjustment

Regression ......................................................................................................................... 25

Table 7.14 List of Macro Results – Y_&OUTNAME ........................................................................ 26

Table 7.15 List of Macro Results – NW&OUTNAME (Similar to N_&OUTNAME but

provides the post-stratification weighted results) .............................................................. 26

Table 7.16 Contents of OW&OUTNAME (Similar to OA&OUTNAME but provides post-

stratification weighted results) .......................................................................................... 27

Table 7.17 Contents of SW&OUTNAME (Similar to SA&OUTNAME but provides post-

stratification weighted results) .......................................................................................... 28

Table A.1 Sample Data for Post-Stratification Weighting ................................................................. 34

Table A.2 Description of test data set variables based on CAHPS Clinician & Group Adult

Survey 3.0 ......................................................................................................................... 35

Table B.1 List of Macro Weight Options .......................................................................................... 41

Table B.2 Case Weighting Used for Case-mix Coefficients ............................................................. 42

Table B.3 Examples of a composite measure with three items using a macro parameter K ............. 45

Table B.4 Examples of composite measure with three items using a macro parameter K and

means ................................................................................................................................. 45

Instructions for Analyzing Data from CAHPS Surveys in SAS

1. Introduction

The CAHPS Analysis Program—often referred to as the CAHPS macro—uses SAS

software to provide

survey users with a flexible way to analyze CAHPS survey data in order to make valid comparisons of

performance. The program can be applied to any of the CAHPS surveys. This document explains how the

CAHPS Analysis Program works and how to use the program to analyze and interpret survey results.

2. What Does the CAHPS Analysis Program Do?

The CAHPS Analysis Program is designed to analyze CAHPS survey data by doing the following tasks:

• Calculating scores. The program calculates scores for all survey measures, including individual

survey items, ratings, and multi-item composite measures. (Learn about composite measures in

the box below).

• Adjusting for case mix. The program adjusts the survey data for standard individual case-mix

variables such as respondent age, education, and general and mental health status. This

adjustment makes it more likely that reported differences are due to real differences in

performance rather than differences in the characteristics of enrollees or patients.

• Comparing scores. The output from the program also compares the performance of any specific

entity (e.g., health plan, hospital, provider group) included in the data set to the overall

performance of all entities.

For each CAHPS measure, the main output results from this program include:

• unadjusted scores (e.g., top box scores)

• number of responses used in the analyses

• overall mean score

• case-mix adjusted scores (if applicable)

• case-mix adjuster coefficients (if applicable)

• significance rating between case-mix adjusted scores and overall score

• adjusted percentages for display in three-bar frequency charts (top box, middle box, bottom box)

What Are Composite Measures?

Composite measures combine CAHPS survey questions that measure the same dimensions of patients’

experiences with health care or health plan services. The use of composite measures simplifies the

interpretation of the data, enhances the reliability of the results (because individual survey items are

often less reliable than combinations of multiple items), and facilitates comparisons of performance

across a unit of analysis (e.g., health plan, medical practice, clinician).

Instructions for Analyzing Data from CAHPS Surveys in SAS

3. What Is Included in the CAHPS Analysis Program?

The CAHPS Analysis Program version 5.0 has three core components: a SAS macro program, SAS test

programs (including the formats), and a test SAS data set. You can download a ZIP file with the program

files and data sets from the Agency for Healthcare Research and Quality’s CAHPS web page about

analyzing CAHPS survey data.

The ZIP file contains the following files:

• MACRO_CAHPS50.SAS – This is the core SAS macro program that performs the analyses the

user specifies in the SAS test program. The macro file should not be modified.

• _1_TEST_FORMAT_CAHPS50.SAS – This program creates formats, which are helpful to

view the data with descriptive words instead of the numeric data values assigned in data (e.g.,

“Always” is shown rather than a “4”).

• _2_TEST_PREPDATA_CAHPS50.SAS – This program contains sample code to create

recoded versions of some variables for the macro run (e.g., by creating dichotomous or reverse-

coded variables).

• _3A_TEST_CAHPS50.SAS – This short program provides sample code for calling the macro

program with different analysis options and outputs specified.

• _3B_TEST_CAHPS50_STRATIFIED.SAS – This program contains sample code for calling

the macro with the post-stratification weighting option.

• TEST_CAHPS50_DATA.SAS7BDAT – This sample SAS data set is used with all the test

programs listed above.

• TEST_CAHPS50_DATA_recoded.SAS7BDAT – This sample SAS data set is similar to

TEST_CAHPS50_DATA.SAS7BDAT, except the recoded variables created by

_2_TEST_PREPDATA_CAHPS50.sas have already been created for you. This data set is for

users who do not need to use _2_TEST_PREPDATA_CAHPS50.sas.

Appendix A explains how to use the test data. Appendix B provides a statistical explanation of some of

the macro parameters. Appendix C explains all the changes made to the various versions of the CAHPS

Analysis Program over the years.

Instructions for Analyzing Data from CAHPS Surveys in SAS

4. Pre-Analysis Decisions

The CAHPS Analysis Program offers the user a number of options for analyzing the survey data. Before

preparing to run the program, analysts should make sure that the project team has agreed upon answers to

the following questions. Their implications for the CAHPS Analysis Program are reviewed below. A list

of all macro parameters is in Section 6.

What is the reporting unit (entity)?

Any analysis of CAHPS data is intended to assess, compare, and report on some type of reporting unit.

Examples of such units include health plans, hospitals, provider groups, clinics, sites of care, and

individual physicians. To avoid confusion, these instructions use the neutral term “entity” to refer to the

unit whose data will be aggregated and analyzed.

Because the CAHPS Analysis Program was initially written for the CAHPS Health Plan Survey, the

reporting unit variable name used in the Analysis Program is “Plan.” This name has no bearing on

the suitability of the program for analyzing data on other types of entities.

Depending on your data collection design, you may be able to use the same data for more than one type of

entity. For example, you could analyze a data set to compare provider groups and then analyze the same

data to assess individual doctors.

Do you need to adjust the results for case mix?

Case mix refers to the distribution of respondents’ health status and sociodemographic characteristics,

such as age or educational level, that may affect survey responses. Without an adjustment, differences

among entities could be due to case-mix differences rather than true differences in quality. (See Case-mix

Adjustment in Appendix B for more information.)

The Analysis Program offers an ADJUSTER macro parameter to include case-mix adjustment as well as

an IMPUTE macro parameter to impute case-mix variables if your case-mix variables have some missing

values.

Will you analyze adult and child surveys together?

The Analysis Program allows users to specify how child and adult surveys will be analyzed. The project

team needs to decide whether to analyze surveys about adults and children separately or together. If you

are analyzing adult and child survey data together, the team must also decide whether to consider

interaction effects. Interaction effects may be an issue when the impact of age or health status on one of

the reporting items depends on whether you are analyzing an adult or child survey. You can adjust for

interaction effects when combining adult and child data by using the ADULTKID macro parameter. (See

Variable CHILD in Section 5.)

What p-value to test statistical significance will you use in the analysis?

A p-value of 0.05 is frequently used to test for statistically significant differences between the entities

being compared. If you choose a different p-value, you can specify it in the Analysis Program using the

PVALUE macro parameter.

Instructions for Analyzing Data from CAHPS Surveys in SAS

What, if any, level of substantive (practical) significance will you use to compare

performance?

Substantive significance refers to an absolute difference between the entities being compared that must be

achieved for that difference to be considered meaningful. For example, two health plans may have

statistically significantly differences in average scores based on the selected p-value, but the substantive

difference between the plans’ mean scores may not be large enough to be considered meaningful.

The Analysis Program has two options that allow the user to specify a difference that is substantive. You

can use these options simultaneously or specify only one.

First method. The team decides on a percentage of the distance to the nearest bound that would be a

meaningful difference between entities. You can enter this fraction in the Analysis Program using the

CHANGE macro parameter.

Second method. A much simpler method is to specify an absolute difference that must exist between the

entity’s mean score and the mean score for all entities in the analysis for a difference to be considered

substantive. For this method, you can specify the absolute difference considered to be meaningful using

the MEANDIFF macro parameter.

Do results need to be analyzed using weighting?

In general, weights carry information that helps to make the data more representative of the target

population whose experiences are being assessed. Weighting can arise at three points in the computations

performed by the Analysis Program:

(1) Estimation of case-mix regression coefficients

(2) Calculation of adjusted entity mean scores

(3) Calculation of an overall mean score and significance tests of differences from the overall mean

score.

To set up weights in the Analysis Program, you can use the WGTRESP, WGTMEAN, and WGTPLAN

macro parameters. (See Case Weighting in Appendix B.)

How do you want to weight the items in the composite measure?

A composite measure requires a more elaborate computation to develop the mean score because it

includes more than one item. Users can decide to calculate the composite measure score by selecting one

of the weight options: 1. Weight the items by the number of respondents, 2. Weight the items by the sum

of the respondent weight, or 3. Equal weight (Weight the items equally, calculate as the average of the

number of items). For the equal weight option, users can select an option to adjust the item weight if some

of the items in the composite measure have a low response rate. (See Case Weighting in Appendix B.)

Instructions for Analyzing Data from CAHPS Surveys in SAS

5. Software Requirement and Data File Specifications

SAS

Software Requirement

The CAHPS Analysis Program was developed using SAS software. Running the program requires Base

SAS software and the SAS/STAT module. Base SAS, which is required to use any SAS product, provides

the print commands, simple plotting capabilities, and procedures for descriptive statistics needed to run

the Analysis Program. The SAS/STAT module adds several statistical procedures such as the SAS

regression procedure, PROC REG or PROC SURVEYREG, to do part of the case-mix calculations.

Data Set Structure

Each row or case in a SAS data set represents data from a unique questionnaire. Appendix A offers

examples of how to meet many of the variable coding and cleaning requirements before using the

Analysis Program.

If data from different CAHPS questionnaires are in the same data set, responses for equivalent questions

are listed under the same variable names, with each row representing data for a unique questionnaire.

Sample Size Requirements

Number of entities (e.g., health plans or providers). The data set must have surveys from at least two

entities. If there is only one entity in the data being analyzed, statistical comparisons cannot be performed

and some parts of the program will not work properly. All the reports will still be produced, but some of

the results will be of limited value.

Number of responses per entity. The Analysis Program requires at least two responses per entity. The

program flags entities with fewer than 100 responses for an individual measure, but performs the analysis

on all entities with at least two records. Including entities with very little data may reduce the precision of

comparisons between individual entities or providers and the overall mean scores.

Variable Naming Requirements

The variable names PLAN, CHILD, VISITS, and SPLIT have specific meaning for the Analysis

Program. If the data set has other variables with these names that do not conform to the specifications

below, the macro may produce errors in the log file and the results may be erroneous. Additionally,

variables starting with RES_ are treated as one of the array statements in the program so variables with

these names may cause errors.

Variable PLAN. The variable PLAN, which refers to the reporting unit or entity, must be included in

your data set. While the variable can be any type of entity, it must be called PLAN for the Analysis

Program to work.

The Analysis Program accepts alphanumeric, character, and numeric formats for this variable in the data

set. Note that this is the only variable that does not have to be coded numerically. The maximum variable

length for PLAN is 40 characters.

Instructions for Analyzing Data from CAHPS Surveys in SAS

Variable CHILD. If your data set combines adult and child data and

• You will analyze the data together (macro parameter ADULTKID = 1) or

• You will conduct child-only analyses (ADULTKID = 2),

you will need to create the dichotomous variable CHILD to distinguish between adult (CHILD = 0) and

child (CHILD = 1) surveys.

If the CHILD variable is missing from the data set, the Analysis Program creates a CHILD variable and

sets it to CHILD = 0 (Adult).

Requirements for Recoding Survey Response Options

All analytic variables used by the Analysis Program must be numeric. The tables below show the

different types of variables that may need to be recoded.

Yes/No Variables. Variables with “yes/no” response categories for analysis should be coded as 0 (No)

and 1 (Yes) as shown in Table 5.1. All variables with dichotomous response options should be coded in

this manner. For easier interpretation of the results, the “positive” response should have the highest value.

Data for dichotomous variables will most likely need to be recoded as the precodes found in the survey

instrument for the response values typically set the values of the responses to 2 and 1 rather than 0 and 1.

Table 5.1 Yes/No Variables

Typical Response Value

on CAHPS Surveys

Recoded Numeric

Response value Label/description

2 0 No

1 1 Yes

Any other value . (Missing) Not analyzed by the CAHPS Analysis

Program

Three Response Variables. Any variable with three response options should be coded as shown in Table

5.2. For easier interpretation of the results, the “positive” response should have the highest value. Reverse

coding may be necessary to ensure that the most positive response – for example, “Yes, definitely” – has

the highest value.

Table 5.2 Three Response Variables

Typical Response Value

on CAHPS Surveys

Recoded Numeric

Response value Label/description

1 3 Yes, definitely

2 2 Yes, somewhat

3 1 No

All other values . (Missing) Not analyzed by the CAHPS Analysis

Program

Instructions for Analyzing Data from CAHPS Surveys in SAS

Four-Point Frequency Scale Variables. Variables with “never” to “always” response options are coded

as shown in Table 5.3. All variables with four response options should be coded in this manner. For easier

interpretation, the “positive” response – for example, “Always” or “Definitely yes” – should have the

highest value.

Table 5.3 Four-Point Frequency Scale Variables

Typical Response Value

on CAHPS Surveys

Recoded Numeric

Response value Label/description

4 1 Definitely no

3 2 Somewhat no/Probably no

2 3 Somewhat yes/Probably yes

1 4 Definitely yes

All other values . (Missing) Not analyzed by the CAHPS Analysis

Program

Coding for Case-Mix Adjuster Variables. If the project team decides to adjust the survey results for

case mix, numeric variables must also be properly coded for each adjuster variable. If the adjuster

variable is used as a continuous variable, the effects associated with the categories are assumed to be

proportional to the differences among the coded values. This approach is different than recoding the

adjuster variable as dichotomous with reference categories (dummy variable).” (See Appendix A for

additional sample SAS code.) The dummy variable corresponding to one category, the “reference

category” should be omitted; coefficients corresponding to any other category represent the estimated

effect of being in that category relative to the reference category. While the choice of reference category

has no effect on the case-mix adjustment results, it is common to use the category with the most responses

(or close to the most) as the reference category. Dummy variable coding allows the differences between

effects associated with different responses to be determined by the data rather than assuming any

particular pattern.

Table 5.4 is an example of recoding AGE into a set of dummy variables, one of which should be omitted

from the ADJUSTER macro parameter as the reference category. The response categories on the CAHPS

surveys may differ from this example, so please refer to the survey for the appropriate response options.

Table 5.4 Example of Creating Dummy Variables – What is your age?

Typical response value

on CAHPS Surveys Recoded Dummy variables and value

Label/description

(years)

1 1 = Age_24under, 0 = not in the group 18 to 24

2 1 = Age25_34, 0 = not in the group 25 to 34

3 1 = Age35_44, 0 = not in the group 35 to 44

4 1 = Age45_54, 0 = not in the group 45 to 54

5 1 = Age55_64, 0 = not in the group 55 to 64

6 1 = Age65_74, 0 = not in the group 65 to 74

7 1 = Age75older, 0 = not in the group 75 or older

All other values Code to missing

Instructions for Analyzing Data from CAHPS Surveys in SAS

6. CAHPS Macro Parameters and Call Statements

The CAHPS Analysis Program requires six key parameters (VAR, VARTYPE, NAME, ADULTKID,

DATASET, and OUTNAME). Table 6.1 below lists the required parameters with the valid value ranges;

these parameters have no default value and therefore must be specified. Table 6.2 lists 23 optional

parameters, also with the valid value ranges. These optional parameters all have default values and do not

need to be specified if the default value is acceptable.

If you are using case-mix adjusters, the ADJUSTER parameter is required. The order of the parameters in

the macro call statement does not matter. Parameters should be separated by a comma.

Table 6.1 Required Parameters for the CAHPS Macro 5.0

Required

Parameter Description Values

Var

Name(s) of variable(s)

being analyzed (composite

measure items, global

rating items, or other

single items)

Name(s) of variable(s) from SAS data set to include in

the analysis (e.g., composite measure items, global

rating, or other single items). For composite measure

items, separate the variable names by a single space.

Vartype

Type of variable

Note: variables in

composite measures

should have the same type.

1 = Dichotomous scale (yes/no 0-1)

2 = Global rating scale (0-10)

3 = “How often” scale or other four-point

response scale (“never” to “always” scale 1-4)

4 = Any type of three-point response scale (1-3)

5 = Other scale (Must assign a value to min_resp

and max_resp arguments)

Name

Description of composite

measure, global rating

item, or other items

Note: This parameter is limited to 40 characters and

can be numeric, text, or a combination of

both.

Adultkid

Specifies how to analyze

child and adult surveys.

Note: If analyzing data

other than adult only, the

CHILD variable must be

included in the data set.

0 = Combine adult and child survey data in

analysis; do not consider interaction effects in

case-mix adjustment. This option can be used

if the data set contains only a single type of

survey.

1 = Combine adult and child survey data in

analysis; consider interaction effects between

child and each case-mix adjuster variable. For

more details, see the box below on Adult and

Child Interactions.

2 = Analyze child data only.

3 = Analyze adult data only.

Dataset

SAS data set name to be

used in the analysis

Name of the SAS data set used in the input file (i.e.,

your data set of CAHPS survey responses).

Outname

Part of SAS data set name

for output tables created

for summary results

Name for the SAS data sets saved with the summary

results. To avoid creating SAS data sets, enter ‘ ‘.

The results tables will still be created for the .out file.

Instructions for Analyzing Data from CAHPS Surveys in SAS

Macro Parameter ADULTKID: Adult and Child Interactions

When the macro parameter ADULTKID equals 1, the macro creates adult and child interactions for the

adjuster variables. The macro creates additional adjuster variables, with the naming convention AC1,

AC2, ..., ACn, where n is the total number of adjusters originally submitted in the macro call parameter

ADJUSTER. When there is an adult and child interaction, the macro creates the ACx variables by

looping through the list of adjusters.

For example:

If your adjuster variables are for general health status (GHR), age, and education, then the

following additional interaction adjuster variables are created:

AC1 = GHR * CHILD

AC2 = AGE * CHILD

AC3 = EDUCATION * CHILD

Example SAS Macro Call with Only the Required Parameters

The example call statement below includes only the required parameters. Explanatory text is provided

next to each parameter between the “/*” and “*/”. All text in between these characters is commented out

or ignored by SAS. This example shows a single item (q05_re) being analyzed.

%cahps(

var = q05_re, /*Name of variable to be analyzed*/

vartype = 1, /*Set the type of variable: 1=

dichotomous scale (1/0)*/

name = Make appt for an illness, /*Label for the outcome variable*/

adultkid = 3, /*Specify how to analyze child and

adult surveys: 3= analyze adult data

only*/

dataset = test, /*Name of the input data*/

outname = illness /*Name used for the output data set*/

) ;

Table 6.2 Optional Parameters for CAHPS Macro 5.0

Parameter Description Values

Describing Variables

Min_resp

Used with vartype = 5 only—

the minimum response value

Can be any numeric value. It will be used as the low

value for the valid response options.

Max_resp

Used with vartype = 5 only—

the maximum response value

Can be any numeric value. It will be used as the

high value for the valid response options.

Recode

Recodes the global rating and

the “How often” or 4-point

scales down to three

categories before performing

the case-mix adjustment and

the statistical tests. The

default value is 0.

0 = For the statistical tests, use uncollapsed

response options for the variables in the

Var parameter.

For the “Percent of each response” table

and report, use the default recode

collapsing into 3 categories.

Instructions for Analyzing Data from CAHPS Surveys in SAS

Parameter Description Values

Default recode into 3

categories is as follows:

Rating Scale: Vartype = 2

1. 0-6

2. 7-8

3. 9-10

4-point Scale: Vartype = 3

1. 1-2

2. 3

3. 4

NOTE: If Vartype is not

equal to 2 or 3, then no

recoding.

This method is the default; the RECODE

option is not needed in the macro call if

it = 0.

1 = For the statistical tests and the “Percent of

each response,” use the default recode

collapsed into 3 categories.

2 = For the statistical tests, use uncollapsed

response options for the variables in the

Var parameter.

For the “Percent of each response” table

and report, split the “Rating” scale into

three categories with the following break

points: 0-7|8-9|10. The recode for this

option differs from the default recode

such that the top box is just the highest

response for the rating scale.

Rating Scale: Vartype = 2

1. 0-7

2. 8-9

3. 10

For the 4-point scale, use the default recode

collapsed into 3 categories: 1-2|3|4.

3 = For the statistical tests and the “Percent of

each response” table, use the recoded

collapsed categories as shown in Recode

option 2 above.

For Case-Mix Adjusting

(only “Adjuster” is required for case-mix adjusting; other case-mix parameters have default values

if not specified)

Adjuster

Name(s) of case-mix adjuster

variables

Name(s) of case-mix adjuster variables—separated

by a space if using more than one case-mix variable.

Adj_bars

Flag indicating if the

frequencies for the response

values are to be case-mix

adjusted for the triple stacked

bar (top box, middle box,

bottom box). The default

value is 0.

0 = Do not case-mix adjust the triple stacked

bars (i.e., top box, middle box, bottom

box).

1 = Case-mix adjust the triple stacked bars

(i.e., top box, middle box, bottom box)

and store the adjusted frequencies along

with the unadjusted frequencies.

Impute

Flag for imputation of

missing data for adjuster

variables. The default value is

0 = Do not impute mean values by plan for all

adjuster variables.

1 = Impute mean values by plan for all

adjuster variables.

Instructions for Analyzing Data from CAHPS Surveys in SAS

Parameter Description Values

Proc_type

Assign the procedure type for

the case-mix regression. The

default value is 0.

0 = Run the case-mix model under PROC REG.

1 = Run the case-mix model under PROC

SURVEYREG.

Note: For PROC SURVEYREG, only a cluster

option is available, which sets CLUSTER = PLAN

in the CAHPS Analysis Program.

Saving Files

Bar_stat

Flag indicating if permanent

data sets for the case-mixed

frequencies should be saved.

The default value is 0.

0 = Do not save the statistical results in data

sets for the case-mix adjusted triple

stacked frequency bars (i.e., top box,

middle box, bottom box).

1 = Save the case-mix adjusted statistical

results in permanent data sets for the triple

stacked frequency bars (i.e., top box,

middle box, bottom box).

Kp_resid

Flag used to make the

residual values from the SAS

work data set RES_4_ID in

the STD_DATA module. The

residuals are the response

values after case-mix

adjustments have been made.

The default value is 0.

0 = Do not save the residual response values.

1 = Save the residual response values in a

permanent data set.

Id_resp

If there is a unique variable in

the data set that identifies

each individual respondent,

then this variable name may

be entered here. The default

value is blank.

Blank or the name of a variable in the data set.

This variable can be included in the residual data set

when kp_resid = 1. The variable will be a character

and have a maximum of 50 characters.

Outregre

Flag indicating whether or not

to include the regression

output text created by SAS.

The default value is 0.

0 = No regression output appears in the text

file.

1 = Print out the regression output to the text

file.

Assigning Weights

Wgtplan

Specifies whether or not to

use plan weights for the

plan/entity-level statistical

test. The default value is 0.

Note: Set WGTPLAN = 1

when WGTMEAN is applied.

0 = Do not use the plan/entity weights when

computing the overall mean for the

comparison of plan/entity means. Equal

weighting will be used as in previous

versions of the macro.

1 = Use the sum of the weights to the

plan/entity level of the variable specified

in the parameter wgtmean. This weight is

used for weighting the overall and grand

means used for the F statistic for

Instructions for Analyzing Data from CAHPS Surveys in SAS

Parameter Description Values

statistical comparisons of the plan/entity

means.

Wgtmean

Name of the variable storing

the weight values for the plan

means. The default value is

blank. Note: Set WGTPLAN

= 1 when this parameter is

applied.

Blank if no weight is assigned for the plan/entity

Specify the name of a variable used for the

plan/entity-level weight.

Wgtresp

Name of the variable storing

the weight values for

individual respondents. The

default value is blank.

Blank if no weight is assigned

Specify the name of a variable used for the response

weight.

Overall Weights

Overall_wt

Weight options for

calculating the overall mean.

The default value is 2.

0 = Number of respondents

1 = Equal weighting of PLANS (each plan is

assigned a weight of 1)

2 = Population: sum of respondent weights

(based on WGTMEAN)

Coefficient Weights

Wt_type

Weight options for

calculating the case-mix

regression coefficients. The

default value is 0.

0 = Number of respondents

1 = Equal weighting of PLANS

2 = Population: sum of respondent weights

(based on WGTRESP)

Composite Measure Weights

Even_wgt

Determines how to weight

items when calculating

composite measures. The

default value is 1.

0 = Weight by overall number of respondents

for each item.

1 = Use equal weighting of items in

composite measures

(1 / # of items).

2 = Weight by summing the respondent

weight (based on WGTRESP)

Assign a target minimum

number of responses for equal

weighting of items in

composite measures

(even_wgt = 1). The default

value is 1.

Number > 0.

Post-Stratification Weighting

Wgtdata

Specifies whether post-

stratification is used in

weighting. The default value

is 1 (no strata weight).

1 = Do not perform post-stratification

weighting.

2 = Combine strata, conduct post-stratification

weighting.

Instructions for Analyzing Data from CAHPS Surveys in SAS

Parameter Description Values

Note: A separate file is

required to specify the strata

variable (See Step 3b. Post-

Stratification Weighting

Case in Appendix A. Using

the Test Data in the CAHPS

Analysis Program).

Analyzing Data Separately

Subset

Perform the case-mix

adjustments and statistical test

based on each subset of plans;

the subset code is a column in

the plan detail file. The

default value is 1.

Note: A separate file is

required to specify the subset

variable (See Step 3b. Post-

Stratification Weighting

Case in Appendix A. Using

the Test Data in the CAHPS

Analysis Program).

1 = No subsetting done. Global case-mix

model and centering.

2 = Global case-mix model with centered

means for each subset before performing

statistical tests.

3 = Subset case-mix model with centered

means for each subset.

Significance Values

Pvalue

Level of significance for

comparisons. The default

value is 0.05.

Valid values are between 0 and 1.

Change

Level of practical significance

based on a percentage

difference from the minimum

absolute theoretical difference

from the overall mean (can be

used only with ‘p-value’

criteria). The default value

is 0.

Value between 0 and 1 (i.e., 25% is entered as 0.25).

Meandiff

Level of practical significance

based on absolute difference

between plan mean and mean

of all plans (can be used only

with ‘p-value’ criteria). The

default value is 0.

Number  0.

Instructions for Analyzing Data from CAHPS Surveys in SAS

Example SAS Macro Call with Optional Parameters

The sample call statement below includes some optional parameters. Explanatory text is provided next to

each parameter in between the “/*” and “*/”. All text in between these characters is commented out or

ignored by SAS.

%cahps(

var = q05_re q07_re, /*Name of variables in the composite

measure to be analyzed*/

vartype = 1, /*Set the type of variables: 1=

dichotomous scale (1/0)*/

name = Sample Composite Measure, /*Label for the outcome variable*/

adultkid = 3, /*Specify how to analyze child and

adult surveys: 3= analyze adult data

only*/

adjuster = age1824 age2534 age3544

age5564 age6574 age75 ghs, /*List of case-mix adjuster variables

to include*/

adj_bars = 1, /*Flag for the frequencies to be case-

mix adjusted*/

bar_stat = 1, /*Flag to save case-mix adjusted

frequencies*/

impute = 1, /*Flag to impute case-mix adjuster

variables that are missing*/

dataset = test, /*Name of the input data*/

outname = CompositeMeasureName /*Name used for the output data set*/

) ;

For more examples of how to set these parameters, please refer to Appendix A.

Cases Dropped When Performing Analyses

The Analysis Program drops some cases when performing analyses based on missing data. This section

uses a small data set with ten cases, two entities, two questions, and two case-mix adjuster variables to

demonstrate which cases will be used for the analyses.

This example follows two paths for the analysis of a composite measure consisting of two items: Q1 and

Q2. Run 1 uses no adjuster variables; Run 2 uses two adjuster variables, Adjuster 1 and Adjuster 2,

without imputation of missing values of the adjuster’s mean within plan. Note that each item is equally

weighted for the composite measure in both sample runs.

Run 1: CAHPS Macro Call Statement

%cahps(

var = Q1 Q2, /*Name of the two variables in the

composite measure to be analyzed*/

vartype = 3, /*Set the type of variable: 3= 4-point

scale*/

name = Sample Composite Measure, /*Label for the outcome variable*/

adultkid = 0, /*Specify how to analyze child and

adult surveys: 0= analyze all data –

all data is adult*/

dataset = test, /*Name of the input data*/

outname = SampCompositeName_Run1 /*Name used for the output data set*/

) ;

Run 2: CAHPS Macro Call Statement

%cahps(

var = Q1 Q2, /*Name of the two variables in the

composite measure to be analyzed*/

vartype = 3, /*Set the type of variable: 3= 4-point

scale*/

Instructions for Analyzing Data from CAHPS Surveys in SAS

name = Sample Composite Measure, /*Label for the outcome variable*/

adultkid = 0, /*Specify how to analyze child and

adult surveys: 0= analyze all data –

all data is adult*/

adjuster = A1 A2, /*List of case-mix adjuster variables

to include*/

impute = 1, /*Flag if missing case-mix adjusters

to be imputed 1 – default, do not

impute*/

dataset = test, /*Name of the input data*/

outname = SampCompositeName_Run2 /*Name used for the output data set*/

) ;

The macro cleans the two items, Q1 and Q2, to make sure the values are within the valid range for the

given variable type. The macro call indicated they are a type 3 variable, which means the response values

must be a 1, 2, 3, or 4. Any other response value is set to missing. In our small data set, Q1 has a value of

7 (case 10) that is set to missing; all other values are fine. The adjuster values are not cleaned in the

macro, so all values are accepted.

The macro checks for missing values in each case and determines whether to keep the record based on the

macro parameter arguments. The results may differ depending on whether adjusters are used and whether

missing adjusters get an imputed mean value. The cases that are dropped for Run 1 and Run 2, and the

reasons why, are noted as in the last two columns of the table below.

Please note that the variable PLAN refers to your entity of analysis. The periods (.) in Table 6.3 represent

missing values. The case numbers are not a part of the data set; they are used only for reference purposes

later.

Table 6.3 Original Data and Data Used for Analysis

Case Plan

Composite

Measure

Item: Q1

Composite

Measure

Item: Q2

Adjuster

Adj 1

Adjuster

Adj 2

Run 1

(No Adj)

Case Dropped

Run 2

(With Adj)

Case Dropped

1 A 2 4 1 1

2 A 3 . 2 2

3 A 4 2 3 . Dropped: Adj 2

Missing

4 A 4 3 . . Dropped: Adj 1

& Adj 2

Missing

5 A 3 3 2 3

6 B 3 3 2 3

7 B . . 4 5 Dropped: Q1 &

Q2 Missing

Dropped: Q1 &

Q2 Missing

8 B 2 2 5 4

9 B 3 2 6 3

10 B . (7) 3 3 3

Note: The cases in Run 2 that were dropped due to missing adjuster information, could be retained if using the parameter

“IMPUTE” to impute missing adjuster information (see Table 6.2).

Instructions for Analyzing Data from CAHPS Surveys in SAS

The macro uses the cases retained from the first cleaning step: nine records for Run 1 and seven cases for

Run 2. The macro reports and summarizes the number of cases used in each analysis, the percent missing

for each variable, and the percent breakdown of the response categories.

Risk of Out-of-Range Values for Case-mixed Scores

In the special cases where there are very few records for an analysis variable or all respondents answered

in only one or two response categories, it is possible for the case-mix adjusted values to be out of range.

For example, if all respondents to a Health Plan Survey answered “Yes,” where 0= “No” and 1= “Yes” to

a yes/no question, and the adjustment increases the mean score for that health plan, the adjusted mean for

that health plan would be greater than one. Further, the adjusted frequencies would be less than zero

percent for the “No” category and greater than 100 percent for the “Yes” category.

The macro does not force a change in these values, since it would change the mean of the means on the

adjusted scores but not on the unadjusted scores. When reporting your CAHPS survey results, it is

important to set these out-of-range values to the minimum or maximum value for that category. If

necessary, you can then make a manual adjustment to the adjacent category. For example, in the case of

three response categories, where the minimum frequency should be zero and the maximum value is 100,

the case-mixed frequency results are as follows:

category 1 = -2.0,

category 2 = 25.0 and

category 3 = 77.0

The results could be manually adjusted so that

category 1 = 0.0,

category 2 = 23.0 and

category 3 = 77.0.

Instructions for Analyzing Data from CAHPS Surveys in SAS

7. SAS Data Sets Generated by the CAHPS Analysis Program

The CAHPS Analysis Program creates permanent SAS data sets that contain the results of the analyses

performed for each composite measure, single item measure, or global rating. Tables 7.1-7.5 provide a

listing of the SAS data sets created by the CAHPS Analysis Program as well as their naming conventions.

Tables 7.6-7.17 describe the variables included in the SAS data sets that have been created.

SAS Data Sets Output and Naming Conventions

The SAS data sets implement the following naming conventions where &OUTNAME is the text assigned

by the user to the variable “OUTNAME” in the CAHPS macro call.

Table 7.1 SAS Data Sets Output for All CAHPS Macro Calls

Description

SAS Data Set

Naming Convention

Lists any plans dropped by macro with only 0 or 1 records DP&OUTNAME

Lists any plans with less than 100 responses LR&OUTNAME

Unadjusted and adjusted (if adjusters are included) percentages for each

response option

(Collapsed into three categories for global rating: 0-6, 7-8, and 9-10, or 0-7,

8-9, 10; for 4-Point Scales: 1-2, 3, and 4)

N_&OUTNAME

Overall statistics for all entities combined OA&OUTNAME

Percentage missing on the VAR parameter (and Adjusters if included) P_&OUTNAME

Score details for all entities including significant differences SA&OUTNAME

Table 7.2 shows the data sets that are created by adding in case-mix adjusters.

Table 7.2 Additional SAS Data Sets Output from Case-Mix Adjustment

Description

SAS Data Set

Naming Convention

Regression coefficients for each adjuster variable C_&OUTNAME

R-squared values R2&OUTNAME

Residual values (only if parameter KP_RESID = 1) Y_&OUTNAME

Table 7.3 shows the data files created if the user keeps permanent data sets for case-mix adjusted

frequencies using the options:

• ADJ_BARS= 1

• BAR_STAT = 1 and

• the stratified weighting option is not used (WGTDATA = 1).

If VARTYPE = 5, there will be additional SAS Data Sets created for each response option.

Instructions for Analyzing Data from CAHPS Surveys in SAS

Table 7.3 Additional SAS Data Sets Output from Saving Case-Mix Adjusted

Frequencies

Description

SAS Data Set

Naming Convention

Score details for all entities for

• Bottom box for Global Rating: 0-6 or 0-7; for 4-Point scales: 1-2; for 3-

Point scales: 1

• Top box for Yes/No scales: Yes

B1&OUTNAME

Score details for all entities for

• Middle box for Global Rating: 7-8 or 8-9; for 4-Point scales: 3; for 3-

Point scales: 2

• Bottom box for Yes/No scales: No

B2&OUTNAME

Score details for all entities

• Top box for Global Rating: 9-10 or 10; for 4-Point scales: 4; and for 3-

Point scales: 3

• This data set will not appear when the VAR is dichotomous.

B3&OUTNAME

Overall statistics

• Bottom box for Global Rating: 0-6 or 0-7; for 4-Point scales: 1-2; for 3-

Point scales: 1

• Top box for Yes/No scales: Yes

F1&OUTNAME

Overall statistics

• Middle box for Global Rating: 7-8 or 8-9; for 4-Point scales: 3; for 3-

Point scales: 2

• Bottom box for Yes/No scales: No

F2&OUTNAME

Overall statistics for

• Top box for Global Rating: 9-10 or 10; for 4-Point scales: 4; and for 3-

Point scales: 3

• This data set will not appear when the VAR is dichotomous.

F3&OUTNAME

Table 7.4 shows the additional data sets that are created by adding in the post-stratification weighting

option (WGTDATA = 2). These data sets provide similar information to the data sets produced when not

selecting the option, except they provide the post-stratified weighted results. Specifically, when using the

WGTDATA=2 option, the core SAS data sets (e.g., N_&OUTNAME, OA&OUTNAME,

P&OUTNAME) provide the results for each strata, while the data sets listed in Table 7.4 are based on the

results weighted by the strata (i.e., the post-stratified weighted results).

Instructions for Analyzing Data from CAHPS Surveys in SAS

Table 7.4 Additional SAS Data Sets Output from Post-Stratification Weighting Option

Description

SAS Data Set

Naming Convention

Unadjusted and adjusted (if adjusters are included) post-stratified weighted

percentages of each response (similar to N_&OUTNAME)

NW&OUTNAME

Overall statistics for all entities combined (similar to OA&OUTNAME but

with post-stratified weighted results)

OW&OUTNAME

Percentage missing on the VAR parameter (similar to P_&OUTNAME) PW&OUTNAME

Score details for all entities including significant differences (similar to

SA&OUTNAME but with post-stratified weighted results)

SW&OUTNAME

Table 7.5 provides the data files created if the following parameter statements are included to keep

permanent data sets for case-mix adjusted frequencies:

• ADJ_BARS= 1,

• BAR_STAT = 1, and

• Including post-stratification weighting option (WGTDATA = 2).

These data sets provide similar information to the data sets produced when not selecting the post-

stratification weighting option, except they provide the post-stratified weighted results. Specifically, when

using WGTDATA=2 option, the core SAS data sets (e.g., B1, B2 and B3&OUTNAME, F1, F2 and

F3&OUTNAME) provide the results for each strata, while the data sets listed in Table 7.5 are based on

the results weighted by the strata or the post-stratified weighted results. If VARTYPE = 5, there will be

additional SAS Data Sets created for each response option.

Table 7.5 Additional SAS Data Sets Output from Saving Case-Mix Adjusted

Frequencies for Post-Stratification Weighting Option

Description

SAS Data Set

Naming Convention

Post-stratification weighted score details for all entities

• Bottom box for Global Rating: 0-6 or 0-7; for 4-Point scales: 1-2; for 3-

Point scales: 1

• Top box for Yes/No scales: Yes

BA&OUTNAME

Post-stratification weighted score details for all entities

• Middle box for Global Rating: 7-8 or 8-9; for 4-Point scales: 3; for 3-

Point scales: 2

• Bottom box for Yes/No scales: No

BB&OUTNAME

Post-stratification weighted score details for all entities

• Top box for Global Rating: 9-10 or 10; for 4-Point scales: 4; for 3-Point

scales: 3

• This data set will not appear when the VAR is dichotomous.

BC&OUTNAME

Instructions for Analyzing Data from CAHPS Surveys in SAS

Description

SAS Data Set

Naming Convention

Post-stratification weighted overall statistics

• Bottom box for Global Rating: 0-6 or 0-7; for 4-Point scales: 1-2; for 3-

Point scales: 1

• Top box for Yes/No scales: Yes

FA&OUTNAME

Post-stratification weighted overall statistics

• Middle box for Global Rating: 7-8 or 8-9; for 4-Point scales: 3; for 3-

Point scales: 2

• Bottom box for Yes/No scales: No

FB&OUTNAME

Post-stratification weighted overall statistics

• Top box for Global Rating: 9-10 or 10; for 4-Point scales: 4; for 3-Point

scales: 3

• This data set will not appear when the VAR is dichotomous.

FC&OUTNAME

Contents of CAHPS Analysis Program SAS Data Sets

The following tables (7.6 – 7.11) list the contents of the CAHPS Analysis Program SAS data sets created

for all macro runs. These data sets implement the following naming conventions where &OUTNAME is

the text assigned by the user to the parameter “OUTNAME” in the CAHPS macro call.

Table 7.6 Contents of DP&OUTNAME: Plans Dropped

Variable name Description

ALLN Total number of respondents in the data set by PLAN

ORIGPLAN PLAN dropped from analysis as there were fewer than 2 records

USEN Number of usable records for PLAN

Table 7.7 Contents of LR&OUTNAME: Lists Plans with 100 or Fewer Records

Variable name Description

ALLN Total number of respondents in the data set by PLAN

NEWPLAN PLAN name (If stratification case, this is the unstratified entity.

Otherwise, this variable contains the same entity as ORIGPLAN)

NPLAN_ID New Plan ID (If stratification case, this is the unstratified entity.

Otherwise, this variable contains the same entity as OPLAN_ID)

OPLAN_ID Original Plan ID (If stratification case, this will be different from

NPLAN_ID)

ORIGPLAN PLAN name (If stratification case, this will be different from

NEWPLAN)

PLAN Entity ID (This is the same as OPLAN_ID. If stratified weight option is

not selected, NPLAN_ID is also the same as entity ID. If subsetting is

not used, SPLAN_ID is the same as entity ID )

PLANTXT Entity ID and Entity name

SPLAN_ID Plan ID (If subsetting is used, the ID number will be created by each

subset. If no subsetting is used, this will be the same as OPLAN_ID)

Instructions for Analyzing Data from CAHPS Surveys in SAS

Variable name Description

STRATWGT Strata Weight (If no stratification is used, this will be 1)

SUB_ID Subset ID (If no subset is used, this will be 1)

SUBCODE If subsetting is used, the subset name or code is found in this column.

Otherwise it defaults to 1.

USEN Number of usable records for PLAN

USENTXT Number of usable records for PLAN (stored as a character variable)

Table 7.8 Contents of N_&OUTNAME: Response Option Percentages

Variable name Description

ALLN Total number of respondents in the data set by PLAN

OPLAN_ID Original PLAN ID

PLANNAME Entity name

PTRES1 Percent response for

• Bottom box for Global Rating: 0-6 or 0-7; for 4-Point scales: 1-2;

for 3-Point scales: 1

• Top box for Yes/No scales: Yes

PTRES2 Percent response for

• Middle box for Global Rating: 7-8 or 8-9; for 4-Point scales: 3; for

3-Point scales: 2

• Bottom box for Yes/No scales: No

PTRES3 Percent response for

• Top box for Global Rating: 9-10 or 10; for 4-Point scales: 4; for 3-

Point scales: 3

This variable will not appear when the VAR is dichotomous. If

VARTYPE = 5, there will be additional response percentages for each

response option.

SUBCODE If subsetting is used, the subset name or code is found in this column.

Otherwise it defaults to 1.

USEN Number of usable records for PLAN

Additional Variables Included When ADJ_BARS = 1

ADJ_1 Case-mix adjusted percentages for

• Bottom box for Global Rating: 0-6 or 0-7; for 4-Point scales: 1-2;

for 3-Point scales: 1

• Top box for Yes/No scales: Yes

ADJ_2 Case-mix adjusted percentages for

• Middle box for Global Rating 7-8 or 8-9; for 4-Point scales: 3; for

3-Point scales: 2

• Bottom box for Yes/No scales: No

Instructions for Analyzing Data from CAHPS Surveys in SAS

Variable name Description

ADJ_3 Case-mix adjusted percentages for

• Top box for Global Rating: 9-10 or 10; for 4-Point scales: 4; and

for 3-Point scales: 3

This variable will not appear when the VAR is dichotomous. If

VARTYPE = 5, there will be additional case-mix adjusted percentages

for each response option.

If WGTMEAN is not assigned, then WGT_1-WGT3 will be the same as PTRES1- PTRES3

WGT_1 Unadjusted and weighted percentages for

• Bottom box for Global Rating: 0-6 or 0-7; for 4-Point scales: 1-2;

for 3-Point scales: 1

• Top box for Yes/No scales: Yes

WGT_2 Unadjusted and weighted percentages for

• Middle box for Global Rating: 7-8 or 8-9; for 4-Point scales: 3; for

3-Point scales: 2

• Bottom box for Yes/No scales: No

WGT_3 Unadjusted and weighted percentages for

• Top box for Global Rating 9-10 or 10; for 4-Point scales: 4; and for

3-Point scales: 3

This variable will not appear when the VAR is dichotomous. If

VARTYPE = 5, there will be additional unadjusted and weighted

percentages for each response option.

Table 7.9 Contents of OA&OUTNAME: Results from Test for Significant Differences

Variable name Description

DFE The denominator degrees of freedom

DFR The numerator degrees of freedom

GM Grand mean used for F statistics

NTOT Number of respondents analyzed

OV_MEAN The mean of all the PLAN means

OVERALLF The results of the F-test on the null hypothesis for no difference between

entity means

OVERALLP P-value of the F distribution. If the P-value is less than 0.05 (or other

preferred value), the PLAN means are significantly different.

SUBCODE If subsetting is used, the subset name or code is found in this column.

Otherwise it defaults to GLOBAL.

Instructions for Analyzing Data from CAHPS Surveys in SAS

Table 7.10 Contents of P_&OUTNAME: Percent Missing Data

Variable name Description

ALLN Total number of respondents in the data set by PLAN

PLAN Entity ID

PLANNAME Entity name

&VAR The percent of responses on the VAR variable(s) that are missing by

PLAN

SUBCODE If subsetting is used, the subset name or code is found in this column.

Otherwise it defaults to 1.

Additional variables included when ADJUSTER(S) are included

A separate variable for each adjuster is included in the results. This example shows 4 adjuster

variables.

ADJUSTER1 The percent of responses on the Adjuster 1 variable that are missing by

PLAN

ADJUSTER2 The percent of responses on the Adjuster 2 variable that are missing by

PLAN

ADJUSTER3 The percent of responses on the Adjuster 3 variable that are missing by

PLAN

ADJUSTER4 The percent of responses on the Adjuster 4 variable that are missing by

PLAN

Table 7.11 Contents of SA&OUTNAME: Plan Level Results

Variable name Description

ADJ_MEAN Weighted (if assigned) and adjusted plan mean for case-mix adjuster

variables. If no adjuster or weighting is selected, this will match

UNA_MEAN.

ALLN Total number of respondents in the data set by PLAN

CL95 Half-width of the 95% confidence interval, calculated as 1.96*SE. The

true (population) value of the estimate (DELTA) falls within the interval

(Estimate -CL95, Estimate +CL95) with 95% confidence.

DELTA The difference between the PLAN mean and overall mean

MEANING Rating of plan performance for the VAR variable based on a comparison

of the “Plan Mean” to “Overall Mean.”

Identifies statistically meaningful differences:

1 = Plan was significantly below average

2 = Plan was not significantly above or below average

3 = Plan was significantly above average

PLAN_WGT Value of the PLAN weight. If weight not assigned, then this column has

PLAN_WGT = 1.

PLANNAME Entity name

SE Standard error of “Plan Difference From Mean” or Delta

SUBCODE If subsetting is used, the subset name or code is found in this column.

Otherwise it defaults to 1.

Instructions for Analyzing Data from CAHPS Surveys in SAS

Variable name Description

UNA_MEAN Weighted (if assigned) and unadjusted plan mean (even if case-mix

adjuster variables are included)

USEN Number of usable records for PLAN

UWT_MEAN Unweighted and unadjusted plan mean (even if weighting is assigned

and case-mix adjuster variables are included)

VP Variance of the plan means

Six additional files are created when case-mix adjustment is performed AND

• the ADJ_BARS= 1 and BAR_STAT = 1 options are chosen, and

• no post-stratification weighting is performed (WGTDATA = 1 (no post-stratification weighting)).

The first three data sets have the same variables described in SA&OUTNAME, which provides the PLAN

level overall results (see Table 7.11) but these additional files provide results for each response option

(collapsed).

• B1&OUTNAME provides the statistics for the:

• Bottom box for the Global Rating (Response options 0-6 or 0-7), 4-Point scales

(Response options 1-2), or 3-Point scales (Response option 1).

• Top box for the dichotomous Yes/No scales (Yes response option).

• B2&OUTNAME provides the statistics for the:

• Middle box for the Global Rating (Response options 7-8 or 8-9), 4-Point scales

(Response options 3), or 3-Point scales (Response option 2).

• Bottom box for the dichotomous Yes/No scales (No response option).

• B3&OUTNAME provides the statistics for the:

• Top box for the Global Rating (Response options 9-10 or 10), 4-Point scales

(Response options 4), or 3-Point scales (Response option 4).

For dichotomous variables, this data set is not created.

The second three data sets have the same variables described in OA&OUTNAME, which provides overall

results for all PLANS combined (see Table 7.9), but these additional files provide results for each

response option (collapsed).

• F1&OUTNAME provides the results from the tests for significant differences between entities

for the:

• Bottom box for the Global Rating (Response options 0-6 or 0-7), 4-Point scales

(Response options 1-2), or 3-Point scales (Response option 1).

• Top box for dichotomous Yes/No scales (Yes response option).

Instructions for Analyzing Data from CAHPS Surveys in SAS

• F2&OUTNAME provides the results from the tests for significant differences between entities

for the:

• Middle box for the Global Rating (Response options 7-8 or 8-9), 4-Point scales

(Response options 3), or 3-Point scales (Response option 2).

• Bottom box for dichotomous Yes/No scales (No response option).

• F3&OUTNAME provides the results from the tests for significant differences between entities

for the:

• Top box for the Global Rating (Response options 9-10 or 10), 4-Point scales

(Response options 4), or 3-Point scales (Response option 4).

For dichotomous variables, this data set is not created.

Tables 7.12 – 7.14 lists the contents of the additional SAS data sets produced when using case-mix

adjusters.

Table 7.12 Contents of C_&OUTNAME: Case-mix Adjustment Regression

Coefficients

Variable name Description

COE_&OUTNAME Case-mix adjustment regression coefficients

P_&OUTNAME P-value of case-mix adjustment regression coefficient

SE_&OUTNAME Standard error of case-mix adjustment regression coefficient

SUBCODE If subsetting is used, the subset name or code is found in this column.

Otherwise it defaults to GLOBAL.

VARIABLE Name of case-mix adjuster variable(s)

Table 7.13 Contents of R2&OUTNAME – R-Squared Results for the Case-mix

Adjustment Regression

Variable name Description

_ADJRSQ_ The adjusted R-squared value from the regression for the dependent

variable (OUTNAME variable)

_DEPVAR_ Name of the OUTNAME variable

_RSQ_ The R-squared value from the regression for the dependent variable

(OUTNAME variable)

SPLIT This variable is used when data is split into two groups. If the data is

not split, it defaults to 0.

SUBCODE If subsetting is used, the subset name or code is found in this column.

Otherwise it defaults to GLOBAL.

Instructions for Analyzing Data from CAHPS Surveys in SAS

Table 7.14 lists the contents of the additional SAS data set produced when using case-mix adjuster

variables and when KP_RESID = 1.

Table 7.14 List of Macro Results – Y_&OUTNAME

Variable name Description

_ID Individual assigned ID

_ID_RESP Original Respondent ID

ITEMNO_MAX Number of items (for single measures this will equal 1; for composite

measures this will equal the number of items in the composite measure)

PLAN Entity ID

YRESID Residual

Tables 7.15 – 7.17 provide the contents of the data sets produced when using the post-stratification

weighting option.

Table 7.15 List of Macro Results – NW&OUTNAME (Similar to N_&OUTNAME but

provides the post-stratification weighted results)

Variable name Description

ALLN Total number of respondents in the data set by PLAN

OPLAN_ID Original PLAN ID

PLANNAME Entity name

PTRES1 Post-stratification weighted percentage for

• Bottom box for Global Rating 0-6 or 0-7; for 4-Point scales: 1-2; for

3-Point scales: 1

• Top box for Yes/No scales: Yes

PTRES2 Post-stratification weighted percentage for

• Middle box for Global Rating:7-8 or 8-9; for 4-Point scales: 3; for 3-

Point scales: 2

• Bottom box for Yes/No scales: No

PTRES3

Post-stratification weighted percentage for

• Top box for Global Rating: 9-10 or 10; for 4-Point scales: 4; for 3-

Point scales: 3

This variable will not appear when the VAR is dichotomous. If

VARTYPE = 5, there will be additional post-stratification weighted

percentages for each response option.

SUBCODE If subsetting is used, the subset name or code is found in this column.

Otherwise it defaults to 1.

USEN Number of usable records for PLAN

Instructions for Analyzing Data from CAHPS Surveys in SAS

Variable name Description

Additional Variables Included When ADJ_BARS = 1

ADJ_1 Post-stratification weighted case-mix adjusted percentages for

• Bottom box for Global Rating 0-6 or 0-7; for 4-Point scales: 1-2; for

3-Point scales: 1

• Top box for Yes/No scales: Yes

ADJ_2 Post-stratification weighted case-mix adjusted percentages for

• Middle box for Global Rating: 7-8 or 8-9; for 4-Point scales: 3; for

3-Point scales: 2

• Bottom box for Yes/No scales: No

ADJ_3 Post-stratification weighted case-mix adjusted percentages for

• Top box for Global Rating: 9-10 or 10; for 4-Point scales: 4; for 3-

Point scales: 3

This variable will not appear when the VAR is dichotomous. If

VARTYPE = 5, there will be additional post-stratification weighted

case-mix adjusted percentages for each response option.

If WGTMEAN are not assigned, then WGT_1-WGT3 will be the same as PTRES1- PTRES3

WGT_1 Unadjusted, weighted, and post-stratification weighted percentages for

• Bottom box for: Global Rating 0-6 or 0-7; for 4-Point scales: 1-2;

for 3-Point scales: 1

• Top box for Yes/No scales: Yes

WGT_2

Unadjusted, weighted, and post-stratification weighted percentages for

• Middle box for Global Rating:7-8 or 8-9; for 4-Point scales: 3; for 3-

Point scales: 2

• Bottom box for Yes/No scales: No

WGT_3 Unadjusted, weighted, and post-stratification weighted percentages for

• Top box for Global Rating: 9-10 or 10; for 4-Point scales: 4; for 3-

Point scales: 3

This variable will not appear when the VAR is dichotomous. If

VARTYPE = 5, there will be additional unadjusted, weighted, and post-

stratification weighted percentages for each response option.

Table 7.16 Contents of OW&OUTNAME (Similar to OA&OUTNAME but provides

post-stratification weighted results)

Variable name Description

DFE The denominator degrees of freedom

DFR The numerator degrees of freedom

GM Grand mean used for F statistics

NTOT # of respondents analyzed

OV_MEAN The mean of all the PLAN means

Instructions for Analyzing Data from CAHPS Surveys in SAS

Variable name Description

OVERALLF The results of the F-test on the null hypothesis for no difference between

entity means

OVERALLP P-value of the F distribution. If the P-value is less than 0.05 (or other

preferred value), the PLAN means are significantly different.

SUBCODE If subsetting is used, the subset name or code is found in this column.

Otherwise it defaults to GLOBAL.

Table 7.17 Contents of SW&OUTNAME (Similar to SA&OUTNAME but provides post-

stratification weighted results)

Variable name Description

ADJ_MEAN Post-stratification weighted, weighted (if assigned by WGTRESP) and

adjusted plan mean for case-mix adjuster variables. If no adjuster or

weighting is selected, this will match UNA_MEAN.

ALLN Total number of respondents in the data set by PLAN

CL95 Half-width of the 95% confidence interval, calculated as 1.96*SE. The

true (population) value of the estimate (DELTA) falls within the interval

(Estimate -CL95, Estimate +CL95) with 95% confidence.

DELTA The difference between plan mean and overall mean

MEANING Rating of plan performance for the VAR variable based on a comparison

of plan’s adjusted and post-stratification weighted “Plan Mean” to

“Overall Mean.”

Identifies statistically meaningful differences.

1 = Plan was significantly below average

2 = Plan was not significantly above or below average

3 = Plan was significantly above average

PLAN_WGT

Value of plan weight. If weight not assigned, then this column has

PLAN_WGT = 1.

PLANNAME Entity name

SE Standard error of “Plan Difference From Mean” or Delta

SUBCODE If subsetting is used, the subset name or code is found in this column.

Otherwise it defaults to 1.

UNA_MEAN Weighted (if assigned by WGTRESP) and unadjusted plan mean (even if

case mix adjuster variables are included)

USEN Number of usable records for PLAN

UWT_MEAN Unweighted, unadjusted, and post-stratification weighted plan mean

(even if weighting is assigned and case mix adjuster variables are

included)

VP Variance of the plan means

Instructions for Analyzing Data from CAHPS Surveys in SAS

Six additional files are created when case-mix adjustment is performed and

• the ADJ_BARS= 1 and BAR_STAT = 1 options are chosen and

• post-stratification weighting is performed (WGTDATA = 2 (post-stratification weighting)).

The first three data sets have the same variables described in SW&OUTNAME, which provides the

PLAN level overall results (see Table 7.17), as well as in B1, B2 and B2&OUTNAME. These additional

files provide results for each response option (collapsed).

• BA&OUTNAME provides the post-stratification weighted statistics for the:

• Bottom box for the Global Rating (Response options 0-6 or 0-7), 4-Point scales

(Response options 1-2), or 3-Point scales (Response option 1).

• Top box for dichotomous Yes/No scales (Yes response option).

• BB&OUTNAME provides the post-stratification weighted statistics for the:

• Middle box for the Global Rating (Response options 7-8 or 8-9), 4-Point scales

(Response options 3), or 3-Point scales (Response option 2).

• Bottom box for dichotomous Yes/No scales (No response option).

• BC&OUTNAME provides the post-stratification weighted statistics for the:

• Top box for the Global Rating (Response options 9-10 or 10), 4-Point scales

(Response options 4), or 3-Point scales (Response option 4).

For dichotomous variables, this data set is not created.

The second three data sets have the same variables described in OW&OUTNAME, which provides

overall results for all PLANS combined (see Table 7.16) as well as the F1-F3&OUTNAME data set.

These additional files provide results for each response option (collapsed).

• FA&OUTNAME provides the post-stratification weighted results from the tests for significant

differences between entities for the:

• Bottom box for the Global Rating (Response options 0-6 or 0-7), 4-Point scales

(Response options 1-2) or 3-Point scales (Response option 1).

• For dichotomous Yes/No scales it provides the top box or Yes response option.

• FB&OUTNAME provides the post-stratification weighted results from the tests for significant

differences between entities for the:

• Middle box for the Global Rating (Response options 7-8 or 8-9), 4-Point scales

(Response options 3), or 3-Point scales (Response option 2).

• Bottom box for dichotomous Yes/No scales (No response option).

• FC&OUTNAME provides the post-stratification weighted results from the tests for significant

differences between entities for the:

• Top box for the Global Rating (Response options 9-10 or 10), 4-Point scales

(Response options 4), or 3-Point scales (Response option 4).

For dichotomous variables, this data set is not created.

Instructions for Analyzing Data from CAHPS Surveys in SAS

Appendix A. Using the Test Data in the CAHPS Analysis Program

This appendix explains how to use the test programs and data set provided in the ZIP file for the CAHPS

Analysis Program version 5.0:

• MACRO_CAHPS50.SAS - This is the core SAS macro program that performs the analyses the

user specifies in the SAS test program. The macro file should not be modified.

• _1_TEST_FORMAT_CAHPS50.SAS - This program creates formats, which are helpful to

view the data with descriptive words instead of the numeric data values assigned in data (e.g.,

“Always” is shown rather than a “4”).

• _2_TEST_PREPDATA_CAHPS50.SAS - This program contains sample code to create

recoded versions of some variables for the macro run (e.g., by creating dichotomous or reversed-

coded variables).

• _3A_TEST_CAHPS50.SAS - This short program provides sample code for calling the macro

program with different analysis options and outputs specified.

• _3B_TEST_CAHPS50_STRATIFIED.SAS - This program contains sample code for calling

the macro with the post-stratification weighting option.

• TEST_CAHPS50_DATA.SAS7BDAT - This sample SAS data set is used with all the test

programs listed above.

• TEST_CAHPS50_DATA_recoded.SAS7BDAT - This sample SAS data set is similar to

TEST_CAHPS50_DATA.SAS7BDAT except the recoded variables created by

_2_TEST_PREPDATA_CAHPS50.sas. have already been created for you. This SAS data set is

for users who do not need to use _2_TEST_PREPDATA_CAHPS50.sas.

Before you begin, you need to assign two different library paths for input and output data. The sample

code is listed below. Additionally, the formats are stored in the input data set directory.

%let ProgramName = Test_cahps50 ;

%let root = /data/cahps/analysis_program/version5.0 ;

libname in “&root./data/” ;

libname out “&root./Test_cahps50/” ;

libname library “&root./data/” ;

Once you have determined the location of the input and output data, you can begin to use each test

program following the steps below.

Step 1. Creating the Format Catalog (1_TEST_FORMAT_CAHPS50)

To use the test data, you should first run _1_TEST_FORMAT_CAHPS50.sas to create the formats.

Within the program, you will need to assign a library name for storing the format file as shown below.

Note that the directory for the formats should have the same path as the data set used for the macro run.

%let root = /data/cahps/analysis_program/version5.0 ;

libname in “&root./data/your format path here” ;

Instructions for Analyzing Data from CAHPS Surveys in SAS

Below is an excerpt from the Test Format Catalog.

proc format library = in ; /*place the library name where you want to

store the format*/

title "CAHPS Survey Formats for TEST Data Set Version &version " ;

value ynb /*Provides the labeled response categories for each response option

for all variables assigned the format ynb*/

. = ' .: Missing '

1 = ' 1: Yes '

2 = ' 2: No '

98 = '98: Inapplicable '

99 = '99: No Answer Given '

;

value edu /*Provides the labeled response categories for each response option

for all variables assigned the format edu*/

1=' 1: <= 8TH GRADE'

2=' 2: SOME HS'

3=' 3: HS GRAD/GED'

4=' 4: SOME COLLEGE/2-YR DEGREE'

5=' 5: 4-YR COLLEGE GRAD'

6=' 6: >4-YR COLLEGE DEGREE'

98="98: DON'T KNOW"

99='99: REFUSED' ;

run;

Step 2. Preparing the Data for the Macro (_2_TEST_PREPDATA_CAHPS50)

The 2_TEST_PREPDATA_CAHPS50.sas program demonstrates sample SAS code to prepare the data

set before the macro run. More resources can be found in Preparing Data from CAHPS Surveys for

Analysis (available on the AHRQ CAHPS web page about analyzing CAHPS survey data).

The sample code provided in the TEST_PREPDATA file is intended to work with the TEST data set.

You can utilize this code for your own data sets, but will need to make modifications to the statements

depending on the variable names and variable response options in your data set.

1. Set permanent or temporary SAS data set.

data adult;

set in.test_cahps50;

2. Recodes numeric plan variables to character to simplify interpretation of

the result tables.

length plan $ 16 ;

if planid = 1 then plan = ‘PRACTICE_A_URBAN’ ;

else if planid = 2 then plan = ‘PRACTICE_B_URBAN’ ;

else if planid = 3 then plan = ‘PRACTICE_C_URBAN’ ;

else if planid = 4 then plan = ‘PRACTICE_B_RURAL’ ;

else if planid = 5 then plan = ‘PRACTICE_C_RURAL’ ;

3. Recodes dichotomous variables from 1-2 to 1-0; such that the largest number

represents the most positive response.

array org q05 q26;

array rev q05_re q26_re;

do i = 1 to dim ( rev ) ;

if org [i] in (1, 2) then rev [i] = 2 - org [i] ;

else rev [i] = . ;

end ;

Instructions for Analyzing Data from CAHPS Surveys in SAS

4. REVERSE codes item in which never is a positive response and always is a

negative response.

array org2 q23 q24;

array rev2 q23_re q24_re;

do i = 1 to dim ( rev2 ) ;

if org2 [i] in (1, 2, 3, 4, 5)

then rev2 [i] = 6 - org2 [i] ;

else rev2 [i] = . ;

end ;

5. Cleans variables age and general health status for out of range values.

age = q25;

ghr = q27;

if age not in (1, 2, 3, 4, 5, 6, 7) then age = . ;

if ghr not in (1, 2, 3, 4, 5, 6) then ghr = . ;

Step 3. Running Macro_CAHPS50.sas—Specifying Parameter Options

This step is divided into two steps. For no post-stratification weighting case, please refer to Step 3a. For

post-stratification case, please refer to Step 3b.

Step 3a. No Post-Stratification Weighting Case

The following statement includes the macro code MACRO_CAHPS50.SAS where the path before

“Macro_cahps50.sas” is the location of the Macro file.

%include "/data/cahps/macros/data/Macro_cahps50.sas” ;

Examples of using these arguments with the test data set are provided below.

* Executes CAHPS macro for the global rating scale variable, with no case-mix adjuster variables.

%cahps(

var = q18, /*Name of the variable to be

analyzed*/

vartype = 2, /*Set the type of variable: 2 = rating

scale(0-10)*/

name = Rating Provider, /*Label for the outcome variable*/

adultkid = 3, /*Specify how to analyze child and

adult surveys: 3 = analyze adult data

only*/

dataset = test, /*Name of the input data set*/

outname = rprov /*Name used for the output data set*/

);

* Executes CAHPS macro for the “How Often” composite measure, with two case-mix adjuster

variables; instructs the macro to impute any missing case-mix adjuster variable responses;

recodes the 4-point scale to collapse into 1-2|3|4| for 3-part frequency; and uses PROC

SURVEYREG for the case-mix model.

%cahps(

var = q11 q12 q14 q15, /*Name of variables in the composite

measure to be analyzed*/

vartype = 3, /*Set the type of variables: 3 =

“never” to “always” (1-4))*/

recode = 1, /*Recode the scale: 1 = 3-part

frequency where the 4-point scale is

collapsed into 1-2|3|4*/

name = Provider Communication

Composite measure, /*Label for the outcome variable*/

Instructions for Analyzing Data from CAHPS Surveys in SAS

adultkid = 3, /* Specify how to analyze child and

adult surveys: 3= analyze adult data

only */

Adjuster = age ghr, /*List of case-mix adjuster variables

to include*/

impute = 1, /*Flag to impute case-mix adjusters

that are missing*/

dataset = test, /*Name of the input data set*/

outname = ProvComm /*Name used for the output data set*/

proc_type = 1 /*Specifies the use of PROC SURVEYREG

for the case-mix model/

);

Step 3b. Post-Stratification Weighting Case

To combine data for reporting from different sampling groups, or strata, you must add a text file to the

program before the macro run. Some examples illustrate situations in which this feature might be used:

1. Two health plans are merged that were formerly separate and were treated as such in the survey.

The two former health plans are the strata, with each assigned a weight to combine for post-

stratification into a single health plan score.

2. A health system surveys patients in all their practice sites by urban or rural locations, but the

system wants to combine these urban and rural patients into their respective practice sites for

reporting.

If stratification is part of your survey design, you must create an ASCII data set with columns separated

by one or more spaces for these four variables:

• Original Plan – a unique identifier of the entities or strata before they are combined. This

variable can be coded as alphanumeric, but it cannot exceed 16 characters. This variable is the

first column of the data table.

• New Plan – identifier for the entities that will be created by combination of strata or post-

stratification. This variable can be coded as alphanumeric, but it cannot exceed 16 characters.

This variable is the second column of the data table. If no stratification is being done, this column

may look identical to the column for original plan.

• Strata Weight – a numeric variable that indicates the size of the population for the entities or

strata. This variable is used to create the weights for combining the strata. This variable is the

third column of the data table. If no stratification is being done, this column may be set to 1s.

• Subsetting Code – identifier for the subset (i.e., region, state, county) in which the entity

belongs. This variable can be coded as alphanumeric. If no subsetting is to be done, this column

may be set to 1s.

The ASCII file for the PLAN details should not contain any missing data; each column of data should be

separated by spaces. If tabs are used, the Analysis Program may not read in the data correctly. Also, be

sure not to have any extra records at the bottom of the ASCII file. Importing an ASCII file into SAS is the

only way to add stratification information.

You may use the TEST_CAHPS50_STRATIFIED program as a starting point to create a PLAN detail

ASCII file and change variable names and paths as needed.

Instructions for Analyzing Data from CAHPS Surveys in SAS

An example of the PLAN detail data set is provided for the test program

(TEST_CAHPS50_STRATIFIED.SAS). The data file is called “plandtal.dat” and looks like the text

below:

Table A.1 Sample Data for Post-Stratification Weighting

Origplan (PLAN for

each strata)

Newplan (post-

stratification PLAN) stratwgt Subcode

PRACTICE_A_URBAN PRACTICE_A 5000 Northeast

PRACTICE_B_URBAN PRACTICE_B 8000 Northeast

PRACTICE_C_URBAN PRACTICE_C 15000 South

PRACTICE_B_RURAL PRACTICE_B 2000 Northeast

PRACTICE_C_RURAL PRACTICE_C 3000 South

• The first column provides a unique identifier for each strata/plan (original plan).

• The second column, the new plan variable, indicates which strata/plan will be combined for post-

stratification weighting.

• The third column, the strata population size, is used to compute the weights for the strata. Strata

with greater population sizes receive more weight than smaller units in the combined strata.

• The fourth column is the region (subset) of the country in which each strata/plan does business.

After the text file is created, use similar codes in Step 3a to run the macro. In the macro statement, you

must set WGTSTAT = 2 to use the post-stratification weighting. For more details, please refer to

examples of using these arguments with the test data set in TEST_CAHPS50_STRATIFIED.SAS.

Examples of using these parameter with the test data set are provided below.

* Executes CAHPS macro with global rating scale variable, case-mix adjusters, and using post-

stratification weighting.

%cahps(

var = q18, /*Name of the variable to be analyzed*/

vartype = 2, /*Set the type of variable: 2 = rating

scale(0-10)*/

name = Rating Provider, /*Label for the outcome variable*/

adjuster = q25 q23_re, /* List of case-mix adjuster variables to

include */

adultkid = 3, /*Specify how to analyze child and adult

surveys: 3 = analyze adult data only*/

adj_bars = 1, /*Flag for the frequencies to be case-mix

adjusted*/

bar_stat = 0, /*Flag if case-mix adjusted frequencies

should be saved*/

wgtdata = 2, /*Combine strata and conduct post-

stratification weighting */

subset = 3, /*Subset case-mix adjustment model for the

subset group (North and South) */

impute = 1, /*Flag if impute case-mix adjusters that are

missing*/

dataset = test2, /*Name of the input data set*/

outname = rprov /*Name used for the output data set*/

);

Instructions for Analyzing Data from CAHPS Surveys in SAS

Contents of the Test Data Set (TEST_CAHPS50_DATA)

Table A.2 lists the contents of the test data set (TEST_CAHPS50_DATA.sas7bdat).

Table A.2 Description of test data set variables based on CAHPS Clinician & Group

Adult Survey 3.0

Variable Description Response options and Formats

PlanID Plan identification number 1 = PRACTICE_A_URBAN ;

2 = PRACTICE_B_URBAN ;

3 = PRACTICE_C_URBAN ;

4 = PRACTICE_B_RURAL ;

5 = PRACTICE_C_RURAL ;

. = Missing

Q05 Last 6 months, make appointment for an

illness, injury with provider

1=Yes

2=No

.=Missing

98=Inapplicable

99=No Answer Given

Q06 Last 6 months, how often get

appointment for routine care as soon as

needed

1=Never

2=Sometimes

3=Usually

4=Always

.=Missing

98=Inapplicable

99=No Answer Given

Q11 Last 6 months, how often provider

explains things

1=Never

2=Sometimes

3=Usually

4=Always

.=Missing

98=Inapplicable

99=No Answer Given

Q12 Last 6 months, how often provider listens

carefully

1=Never

2=Sometimes

3=Usually

4=Always

.=Missing

98=Inapplicable

99=No Answer Given

Q14 Last 6 months, how often provider shows

respect

1=Never

2=Sometimes

3=Usually

4=Always

.=Missing

98=Inapplicable

99=No Answer Given

Instructions for Analyzing Data from CAHPS Surveys in SAS

Variable Description Response options and Formats

Q15 Last 6 months, how often provider spends

enough time with you

1=Never

2=Sometimes

3=Usually

4=Always

.=Missing

98=Inapplicable

99=No Answer Given

Q18 Last 6 months, rate provider 0 (worst) - 10 (best)

.=Missing

98=Inapplicable

99=No Answer Given

Q23 Rate overall general health 1=Excellent

2=Very Good

3=Good

4=Fair

5=Poor

.=Missing

98=Inapplicable

99=No Answer Given

Q24 Rate overall mental health 1=Excellent

2=Very Good

3=Good

4=Fair

5=Poor

.=Missing

98=Inapplicable

99=No Answer Given

Q25 Age 1=18 to 24

2=25 to 34

3=35 to 44

4=45 to 54

5=55 to 64

6=65 to 74

7=75 or older

.=Missing

98=Inapplicable

99=No Answer Given

Q26 Gender 1=Male

2=Female

.=Missing

98=Inapplicable

99=No Answer Given

Instructions for Analyzing Data from CAHPS Surveys in SAS

Variable Description Response options and Formats

Q27 Highest education level completed 1=<= 8grade

2=Some high school

3=High school grad/GED

4=Some college/2-yr degree

5=4-yr college grad

6=>4-yr college degree

.=Missing

98=Inapplicable

99=No Answer Given

Instructions for Analyzing Data from CAHPS Surveys in SAS

Appendix B. Statistical Explanation of Macro Parameters

This appendix contains detailed explanation of some of the macro parameters. It is divided into three sub-

sections:

• Detailed Explanation of Analyses Performed in the CAHPS Analysis Program

• Code Descriptions and Resources

• Detailed Explanation with Statistical Notations (describes how some of the macro parameters are

implemented in the Analysis Program)

Detailed Explanation of Analyses Performed in the CAHPS Analysis Program

Case-mix Adjustment

Health status and age are two patient characteristics frequently found to be associated with patient reports

about the quality of their medical care. People in worse health tend to report more problems with care

than do people in better health. Older patients tend to report fewer problems with care than do younger

patients, although this association is usually not as strong as the one between health status and ratings.

Health status may be related to ratings of care because sicker persons are more likely to give negative

ratings in general (response tendency), because some people are likely to give negative ratings about

anything, including their health and the medical care they receive (correlated error), or because they get

worse care, (i.e., perhaps their greater needs create more opportunities for failure). The age association

has the same ambiguity. However, regardless of the reason, it is misleading to rate an entity worse simply

because of the kind of patients it treats.

In the Analysis Program, if data are missing for an adjuster variable, the program either (at the option of

the user) deletes the case or imputes the entity mean for that variable. The latter procedure avoids losing

observations because of missing data; it is acceptable in this setting because, typically, both the size of the

adjustment and the amount of missing data on adjusters are small.

Sometimes case-mix adjustments may be required for an entity, but for some reason it would not be

desirable for the ratings from that entity to affect the estimated case-mix coefficients or the recentering of

entity scores. An example in Medicare CAHPS would be where the purpose of the implementation is to

make comparisons among Medicare Advantage (MA) plans, but data were also collected for non-MA

plans and the survey user wants to include them for comparison without affecting the MA scores. A quick

way to implement case-mix adjustment in this instance is to use the case-weighting option. Data from the

entities designated not to affect the model are retained in the sample but assigned very small weights

(such as 0.0000001, or 0.0000001 times their sampling weights if the data are already weighted). The

case-mix model is then applied as usual, using the weights. This trick works because (1) the weights for

the designated entities are so small that the associated data have essentially no influence on the fitted

model and (2) case-mix adjustment is performed in full irrespective of the weights.

Case Weighting

Weighting arises at three points in the computations performed by the CAHPS Analysis Program: (1)

Estimation of case-mix regression coefficients, (2) Calculation of adjusted entity means, and (3)

Calculation of overall mean and significance tests of difference from the overall mean.

Instructions for Analyzing Data from CAHPS Surveys in SAS

(1) Estimation of case-mix regression coefficients. We may think of this calculation as proceeding in

two stages: first calculating sufficient statistics (the statistics for each entity used in calculating the

coefficients) for regressions within each entity, and then pooling these estimates across the entities,

weighting the sufficient statistics by the corresponding entity weight. The weighting issues in the first

stage concern the weights given individual cases in the sufficient statistics for the within-entity

regressions, and in the second stage concern the weights used when pooling the within-entity estimates

across entities. In general, the within-entity regression estimates will be biased and inconsistent if the

weights are related to residuals from the regression, so it is advisable to use the within-entity weights (if

they are available) unless it is known that the sampling was conducted in a way that does not create bias if

the weights are ignored.

There is more leeway in choice of weights at the entity level when pooling the within-entity estimates.

Weighting each entity’s statistics by the sum of the case weights of cases in an entity yields estimated

coefficients that are representative of the entire population, by weighting the data from each entity by the

total population of the entity. While population representativeness is a common objective for analysis of

surveys, it has some disadvantages in CAHPS surveys because CAHPS results are reported for entities

rather than the population as a whole. If a few entities have much larger populations than others, they

could dominate estimation of the coefficients in a weighted regression; this could be undesirable because

the objective of regression modeling in case-mix adjustment is to estimate a model that fits reasonably

well across all the entities being compared, not just the largest ones or the pooled population.

Furthermore, such disproportionate weighting is generally less efficient statistically than a weighting that

is more uniform, yielding larger variances for the same amount of data. This approach may nonetheless be

desirable if the primary goal of the analysis is to obtain nationally representative estimates, for example,

for national comparison of subgroups of patient that cut across entities, such as those in different regions

or racial/ethnic groups.

Another option is to weight each entity’s data equally; this can be implemented by dividing each case’s

weight by the total weight for the entity. This serves the objectives of CAHPS analyses where the primary

objective is to compare entities or to examine effects of entity-level factors, but may be inefficient if the

sample sizes per entity vary greatly, especially if some entities have very large samples.

A third weighting option weights each entity by its number of respondents (“precision weighting”); this

can be implemented by multiplying each case’s weight by the ratio of number of respondents to total

weight for the entity. Holding other things approximately equal across entities (such as the residual

variance and the within-entity distribution of characteristics), this is statistically the most efficient

method. In this option, entities with small samples do not gain disproportionate weight. While the largest

entity samples do gain more influence in the regression, in many CAHPS applications the sample sizes

are bounded by design (or by limited resources) so a large entity population does not translate into a

proportionately large entity influence in the regression. A possible disadvantage of this method is that it

depends on the sample design and response/nonresponse patterns, and therefore has no clear population

interpretation. Nonetheless, we recommend this as the default option because it is the most robust and

often most statistically efficient method.

The final calculation of case (individual) weights for the case-mix regressions can be understood as

consisting of three steps:

• First, calculate within-entity weights that sum to 1 in each entity; these are equal to the weights

provided to the macro divided by the sum of the weights in each entity.

Instructions for Analyzing Data from CAHPS Surveys in SAS

• Second, calculate entity weights using one of the options defined above.

• Third, multiply the within-entity weights by the entity weights to get the weight used in

regression.

(2) Calculation of adjusted entity means: Because the entity means are calculated separately for each

entity, entity level weights are not relevant to this calculation. On the other hand, for the reasons

described above, the within-entity weighting is usually important to calculation of representative

estimates of entity means. Thus, we recommend that this calculation use any weights that vary across

cases within the same entity, and this is the only option in the Analysis Program.

(3) Calculation of overall mean and significance tests of difference from the overall mean: Since this

step operates only on the entity means, within-entity weight variation is not relevant here. The definition

of the overall mean affects both recentering of the entity means and significance tests of differences from

that mean. A complicating circumstance is that for a composite measure, the number of cases or total

weight may be different for each of the items going into composite measure. The weighting choices for

calculation of the overall mean are to

a. weight the entity means equally,

b. use weights equal to the sum of the weights for cases within each entity, which produces an

estimate of the combined mean of the entire population of cases, or

c. use weights equal to the number of observations used in the calculation of the mean (or the total

of these numbers across the items of a composite measure).

These options are parallel to the options for entity weighting of case-mix adjustment models, but the two

selections are independent.

The choice of method for calculating the overall mean affects the tests of each entity’s difference from

that mean. For example, if one entity has a much larger enrollment than the others, and also an unusually

high mean score, it will pull up the overall mean so it will become more difficult for an entity to

demonstrate significantly better performance than average. With equal weighting of entities, the

comparison is to the mean of entity means, which generally lies in the “middle” of the entity scores but is

not necessarily representative of the combined population of cases.

We recommend choosing between these options based on the interpretation that will be given to the

reported overall mean and therefore to the comparison of each entity’s adjusted mean to that overall

mean. The usual comparisons of entities for quality reporting, incentives, and similar purposes are

intended to place each entity in relation to the collection of entities; we recommend the unweighted mean

of entities (equivalent to equal entity level weights) as the appropriate standard of comparison.

Item Weighting – Algorithm for Composite Measures

The CAHPS Analysis Program uses item weights to compute the means of the composite measures for

each entity. There are three types of weights that users can select in the EVEN_WGT macro parameter.

To use the sum of the number of respondents for the item weights, select EVEN_WGT = 0. The

EVEN_WGT = 2 option uses the sum of the individual weights by each item for the item weight. For the

EVEN_WGT = 1 option, two methods are available for computation of the item weights. First, the item

weight equals one divided by the total number of items. So if equal weighting was chosen and there were

Instructions for Analyzing Data from CAHPS Surveys in SAS

four items in the composite measure, the item weight is 1/4 = 0.25 for each item. An advantage of this

approach is that the relative weights of the items in the composite measure are consistent among survey

administrations. Furthermore, survey users may regard each item as equally important even if some are

answered more frequently than others. A disadvantage of this option is a possible loss of statistical

precision if an item with few responses is combined, equally weighted, with an item with many responses.

Thus, the EVEN_WGT = 1 has an option that solves this problem through down-weighting of low-

response items.

Variance Estimation

Variances are calculated for the mean for each entity, conditional on the coefficients for the adjuster

variables. Conditionally these means are independent (ignoring the recentering constant that is added to

make the mean of the adjusted means equal to that of the unadjusted means for presentation purposes).

Conditioning on the regression coefficients is a standard procedure in variance estimation in the analysis

of surveys (see Cochran, Sampling Techniques, 1977, Chapter 7). It is not difficult to allow for the

covariance of the adjusted means due to uncertainty about the regression coefficients in the case of single-

item reports, but it is difficult to do this in a general way for the multi-item composite measures, when the

pattern of missing data varies by item. In the interest of consistency, we use the same procedure for both

classes of reports.

Code Description and Resources

Case Weighting

Table B.1 lists the types of weight options available in the CAHPS Analysis Program.

Table B.1 List of Macro Weight Options

Phase of

estimation Option Indications and Advantages

CAHPS Analysis

Program options

Regression

for case-mix

coefficients:

within-entity

weighting

Use weights Generally recommended; makes

estimates more population-

representative and reduces chance of

bias due to association of weights with

outcomes

Only option allowed if

weights are provided

(wgtresp)

Ignore weights Use when inefficiency of estimation

with unequally-weighted data is a

problem; consider possible biases

first.

Do not provide weights

Regression

for case-mix

coefficients:

entity-level

weighting

Population:

sum of case

weights

When population-weighted regression

coefficients are of interest

(wgtresp and wt_type = 2)

Equal by entity When primary objective is comparison

among entities of equal importance

(wgtresp and wt_type = 1)

Precision: by

number of

respondents

Maintain statistical efficiency

(precision of coefficients) when

responding sample sizes for entities

vary greatly

(wgtresp and wt_type = 0)

Instructions for Analyzing Data from CAHPS Surveys in SAS

Phase of

estimation Option Indications and Advantages

CAHPS Analysis

Program options

Calculation of

entity means:

within-entity

weighting

Use individual-

level weights

Generally recommended. Only option allowed if

weights are provided

(wgtmean)

Ignore weights Only if weights are known to be

irrelevant.

Do not provide weights

Recentering

and tests of

adjusted

means: entity-

level

weighting

Preserve and

test against

unweighted

mean of means

Relevant when entities are treated as

equal members of population of

entities (as in comparisons for quality

reporting or incentives)

Unweighted option

(overall_wt = 1)

Population:

sum of case

weights

Relevant when testing the entity mean

against the population mean is desired.

Population weight option

(overall_wt = 2)

Use number of

respondents

This is the most efficient weight. It

can be used when there is no

information about the population.

Number of respondents

option (overall_wt = 0)

Table B.2 shows how each type of weight can be calculated.

Table B.2 Case Weighting Used for Case-mix Coefficients

Entity

Survey

weight

Within-entity

weight=

(Survey

weight)/

(Entity total)

Weight options / entity total / derived case

estimation weights

Population

weighting of

entities

Equal

weighting of

entities

Precision

weighting of

entities

Entity total =

sum of survey

weights Entity total=1

Entity total

=number of

respondents

Hxxxx 30 30/180=.1667 30 0.1667 0.833

Hxxxx 40 40/180=.2222 40 0.2222 1.111

Entity Hxxxx total

and

Weight in

estimation

180 1 180 1 5

Hyyyy 30 30/80=.3750 30 0.375 1.125

Hyyyy 20 20/80=.2500 20 0.25 0.75

Entity Hxxxx total

and

Weight in

estimation

80 1 80 1 3

Instructions for Analyzing Data from CAHPS Surveys in SAS

Explanation of weight calculation:

1. Survey weight is entered by the user as a variable in the input data set. It may incorporate

sampling, nonresponse, and/or post-stratification weights. If no weights will be used, this is set to

1 for every responding case. The macro calculates the sum of each respondent’s weight for each

entity.

2. Within-entity weight is the fraction of the entity’s total weight that is assigned to a specific case,

defined as the survey weight of the observation divided by the entity sum of these weights.

3. The total estimation weight in each entity is determined by the entity weighting option chosen

(gray cells).

a. For each entity, this weight is allocated to the observations in the entity in proportion

to its within-entity weight. For population weighting, this recovers the original survey

weights.

b. Because the number of responses may vary across items, these calculations are

repeated for each item.

Item Weighting – Algorithm for Composite Measures

When each item weight is assigned equally by selecting the equal weight option (EVEN_WGT = 1) to

calculate the composite measure mean, a problem may arise if some of the items have low responses. To

solve this problem, the even weight option has a method to assign the item weight by downweighting

low-response items.

The first modification is motivated by the fact that responses to different items in the same composite

measure often have different mean values for a variety of reasons, including how frequently problems

arise in different kinds of interactions and services and how the questions are worded. If the items are

weighted the same way for every entity to calculate the composite measure, the effect of these unequal

means across entities is minimal. However, if the items are not weighted equally, this could give rise to

variations unrelated to variations in quality.

Thus, we first modify the calculation of weighted composite measures to minimize the impact of such

differences in item means on expected scores. To explain the need for this modification, suppose y

is the

mean score for item i at a given entity, and 

is the mean score for item i across all entities. With weights

that sum to 1, the composite measure score is



for a specific plan, and if that plan is at the

average on all measures, its score is





. If the overall means 

differ, this last expression will

depend on w

; in other words, even two plans that are average on every measure will receive different

composite measure scores if the composite measures are calculated with different weights.

To remove this dependence, we center the scores at their means before combining them. Suppose now

that w

represents the weight for item i at a particular entity, and w

represents some standard weights

common to the entire report. Now define a composite measure score as

()

i i i i i

w y w



   

−+

   

   



Any entity that is average (y

=

) on every item will receive the same composite measure score

0ii





Instructions for Analyzing Data from CAHPS Surveys in SAS

regardless of the weights w

, so bias due purely to weighting is removed even if different entities are

scored with different weights. Note that the second term of this composite measure score expression is the

same for every entity; it is included only to bring the average back to an interpretable level as an average

score of overall means.

Given this modification, we can now consider modifying item weights for different entities. The main

requirement is that the weight must be zero (w

=0) when there are no responses for item i; we also want

the weights to be equal (or at least to approach equality) when there is “adequate” sample for every item.

One simple weighting mechanism meeting these requirements is as follows:

• Set w

=1/I, i=1, …, I, where I is the number of items in the composite measure.

• Choose a cutoff number of observations K; weights will not be modified for items with at least K

observations.

• Define entity-specific weights

' 1,...,

min( , ) min( , )

i i i

w n K n K



, where n

is the number of

responses from the entity for item I, and

min( , )

is the lesser of n

and K.

• Calculate composite measure scores as described above.

This procedure has the following desirable properties:

• For each entity, all items with at least K responses are given equal weight. Consequently, there is

no modification to equal item weighting for entities with large samples.

• Items with no responses in a given entity are given no weight, so the composite measure score

can still be calculated.

• Items with low numbers of responses (<K) are given reduced weight so their effect on variance is

mitigated.

• The criterion for determining whether an item will be downweighted is very simple to describe.

The procedure can easily be modified for unequal baseline weights w

Table B.3 illustrates the calculation of item weights for various scenarios in a composite measure with

three items, assuming that the target minimum sample size K=20.

Instructions for Analyzing Data from CAHPS Surveys in SAS

Table B.3 Examples of a composite measure with three items using a macro

parameter K

Sample

sizes n

min(n

,K)

Calculation of

weights w

Weights w

simplified Interpretation

60, 70, 80 20, 20, 20 20/60, 20/60, 20/60 1/3, 1/3, 1/3 Every item has adequate sample

so equal weighting is OK.

0, 22, 24 0, 20, 20 0/40, 20/40, 20/40 0, 1/2, 1/2 Item with no responses gets no

weight.

10, 22, 34 10, 20, 20 10/50, 20/50, 20/50 1/5, 2/5, 2/5 One item has low response and

is downweighted.

2, 3, 5 2, 3, 5 2/10, 3/10, 5/10 2/10, 3/10, 5/10 If all samples are small, weight

each item proportional to the

number of responses to improve

the efficiency of estimation.

Table B.4 illustrates the calculation of the “centered” weighted average in an entity in which one item of

the composite measure has few responses (third line of table above), again assuming K=20.

Table B.4 Examples of composite measure with three items using a macro

parameter K and means

Description Symbol Item 1 Item 2 Item 3

Baseline equal weighting w

1/3 1/3 1/3

Overall (all entities) mean 

3.45 2.75 2.65

Mean in a specific entity y

3.55 2.80 2.75

Sample sizes in that

entity

10 22 34

Weights in that entity w

1/5 2/5 2/5

Centered entity means y

− 

0.10 0.05 0.10

The baseline weighting is assumed to be equal for the three items. Thus, the overall mean composite

measure score is (3.45+2.75+2.65)/3 = 2.95.

Because at the specific entity of interest there are only 10 responses for Item 1, it is given half the weight

of each of the other items. The weighted mean for the entity is then

(1/5)3.55 + (2/5)2.80 + (2/5)2.75 = 2.93. Note that this is below the overall mean composite measure

score, despite the fact that the entity is above the mean on each item, because the item that generally has a

high score is downweighted.

To calculate the score by the proposed method, we first calculate the centered means (last line of table),

which are all positive. Their weighted mean is (1/5)0.10 + (2/5)0.05 + (2/5)0.10 = 0.08. We then add

this mean deviation from mean and add it to the overall mean, 0.08 + 2.95=3.03, which is the reported

score. This correctly reflects the superiority of this entity across all the items.

Instructions for Analyzing Data from CAHPS Surveys in SAS

Detailed Explanation with Statistical Notations

Case-mix adjustment

Let

ipj

represent the response to item i of respondent j from entity p (after recoding, if any, has been

performed). The model for adjustment of a single item i is of the form;

ipjipipjiipj





where



is a regression coefficient vector,

ipj

is a covariate vector consisting of two or five adjuster

covariates (as described above),



is an intercept parameter for entity p, and

ipj



is the error term. The

estimates are given by the following equation:

where

( )



ipiii



,,

is the vector of intercepts,

is the vector of responses and the covariate

matrix is

( )

uuu 

XX =

where the columns of

are the vectors of values of each of the adjuster covariates, and

is a vector

of indicators for membership in entity p, p = 1, 2,…P, with entries equal to 1 for respondents in entity p

and 0 for others.

Finally, the estimated intercepts are shifted by a constant amount to force their mean to equal the mean of

the unadjusted entity means

(to make it easier to compare adjusted and unadjusted means), giving

adjusted entity means











































where 



is the sum of the response weights for each entity p and 





is the adjusted entity mean before

recentering.

For single-item responses, these adjusted means are reported. For composite measures, the several

adjusted entity means are combined with equal item weights (one divided by the number of items as

default), that is, by calculating the mean across items.

Variance of difference from national mean

We first calculate residuals from the regression model for every item response,

pjiipjipj

xyz



−=

where 

is the regression coefficient vector for item i and

ipj

is the response to item i from person j in

entity p. The adjusted mean



for entity p, item i, is the mean (across nonmissing observations) of









is defined by the weight for item i from person j in entity p. If we replace 







with 0 for all

Instructions for Analyzing Data from CAHPS Surveys in SAS

missing responses and define

ipj

if there is a nonmissing response and 0 otherwise, then we can write

this as









 













 

 













and the composite measure score for the entity is





 







 













 

 













where 



is the composite measure item weight for item i . Linearizing this expression by taking

derivatives with respect to each of the sums



ipj

and



ipj

, we obtain the following approximation:





  



















 











 





where 















is the number of responses to item i from entity p,

is defined by the

summand, and

is the weighted mean of

ipj

for the item i in entity p. We now apply the standard

formula for the variance of an estimated sum,

( ) ( )( )



−==

pjppp

dnnVarV



where

is the number of respondents from entity p. This gives an estimate of a variance of the

composite measure score for entity p. If the composite measure consists of a single item, or if there is no

item nonresponse, these results correspond to the standard variance formula.

Note that we do not apply any finite population corrections in this variance calculation. The finite

population correction is appropriate if the object of our inference is the mean rating from the population

of members or patients who are in entity p at the present time. Our concern, however, is with predicting

the mean rating that would represent the experiences of a new set of subscribers or patients joining or

seeking care at the entity, because we are attempting to give guidance to those who are considering anew

their choice of insurance or treatment site. Conceptually, we regard the present members as a sample from

a super-population of potential users of the entity.

Global F-test

The weighted grand mean is calculated as

 



 



 















 











where 



is the weight from entity p. Then the F-statistic is calculated as

Instructions for Analyzing Data from CAHPS Surveys in SAS

( )( )

( )



−−=

VPF

ˆˆ



This statistic has an approximate F distribution with (P-1, q) degrees of freedom; we have found in

simulations that q = n/P (the average sample size per entity) makes the F-test at worst slightly

conservative with typical sample sizes and response distributions. In other words, reported p-values from

the test are slightly larger than they should be, so significant differences are less likely to be declared.

T-tests for entity differences from mean

We compare each entity mean to the mean of the entity means using a t-test. The corresponding

contrast is

(

)

(

)

(

)

(

)







−





where



represents a sum over all entities except entity p. Note that the last expression is simply (P-1)/P

times the difference of



from the mean of all entities except entity p; therefore, the two formulations

(mean vs. mean of all, or mean vs. mean of all others) are equivalent. The variance of 

( )

 





+−=

ppp

VPVPPV

and the t-statistic is calculated as

( )

V 

, and referred to a t distribution with

( )

1−

degrees of

freedom, which again is usually slightly conservative.

Instructions for Analyzing Data from CAHPS Surveys in SAS

Appendix C. Summary of Features Included in Each Version of

the CAHPS Analysis Program

Version 1.0 of the CAHPS SAS Analysis Program offered the following features:

• An assessment of significance using practical and statistical (p-value) criteria;

• An option to analyze data based on outpatient utilization groupings;

• An option to analyze child and adult data together or separately;

• Comparisons of health plan performance; and

• Case-mix adjustments.

Version 1.5 of the CAHPS SAS Analysis Program added the following enhancements:

• Weighting and stratification. The SAS program performs the correct analyses for

disproportionate stratified sampling designs. One way such designs might appear is when two

plans that were surveyed separately have subsequently merged their operations into a single

business entity, and their results will be reported as a single plan. They also may appear when the

sponsor decides to collect additional surveys by using larger sample sizes for a certain subset of

people (based on geographic area, gender, age groups, etc.) beyond what would appear there by

proportionate allocation. To use this feature, the user must specify which strata are combined and

the number of members in each stratum out of the entire population (the weights).

• Plan name flexibility. Plan identifiers for programming and output purposes are no longer

required to be numeric. Text or numeric names are allowed to facilitate programming and

interpretation of results.

• Case-mix adjusters. The program no longer requires two case-mix adjusters (age and health

status) to be used in the analyses. The user can now specify an unlimited number of adjuster

variables or choose not to adjust the data.

• Substantive differences. A new method of specifying an absolute difference that must be

achieved before a difference is meaningful has been added to the program. While the previous

method of determining a meaningful difference is still available, the user can now simply choose

an absolute difference that must exist between means for a difference to be flagged as significant.

• Results tables. Version 1.5 has an additional feature that creates SAS data sets of the results

tables the program produces. This allows users to perform additional analyses on the aggregate

results or to create summary reports. Linear regression coefficients for the adjuster variables are

now output as part of the results tables and reports.

• Missing data for adjusters. In the initial version of the Analysis Program, missing data for the

case-mix adjustment variables was imputed at each item’s health plan mean. Version 1.5 allows

the user to specify whether or not the analysis is conducted with imputation for the adjuster

variables.

Instructions for Analyzing Data from CAHPS Surveys in SAS

Version 2.0 and 2.1 of the CAHPS SAS Analysis Program added the following enhancements and

changes:

• The SAS code has been converted to require only Base SAS and the SAS/STAT module,

eliminating the need for SAS/IML. If adjuster variables are excluded, then the REG procedure in

the SAS/STAT module is not needed. The code has been modularized into macros to aid in

maintaining the macro and understanding what the macro is doing.

• The macro now has two additional ways in which to subset the data being run through the

Analysis Program without having to create separate calls of the Analysis Program. With

SUBSET = 2, the Analysis Program runs the case-mix model on the entire data set but does the

plan/entity comparisons at the subset levels specified in the fourth column of the plan detail file

created by the user. With SUBSET = 3, the Analysis Program does both the case-mix and the

plan/entity comparisons at the subset levels.

• Data sets are now created for the output of the case-mix and hypothesis test calculations. This

allows for easy export to Excel or other programs for report generation.

• The composite measures are no longer restricted to the “How Often” (1-4) question

responses. The variable type is indicated in the macro call and the macro runs a composite

measure calculation if the number of variables is greater than one. This change was made to

accommodate the need to create composite measures from questions with dichotomous and

trichotomous variables. The program can now create composite measures using all variable types

used in the survey

• The weighting of the composite measure items now has the option of doing equal weighting

across items as well as weighting based on the number of responses in each item divided by the

total number of responses in all items. The default option for the macro is to use the equal

weighting.

• An option is available for recoding the global rating scales from 0 – 10 to 1 – 3 and the “How

Often” scales from 1 – 4 to 1 – 3 using the new parameter RECODE. The primary rationale for

the recoding into three categories is to make the data entering into the hypothesis tests entirely

consistent with the information presented in the “Bar Graph” reports.

• A secondary rationale for recoding is that it may improve the statistical properties of the tests. On

general statistical principles, it would not be surprising if the analysis of very skewed data were

improved by a transformation that reduced the skewness. In the CAHPS survey, it is plausible

that the difference between 0 and 2, both indicating strong dissatisfaction, carries with it less

information than the difference between 8 and 10, reflecting average and maximum satisfaction,

respectively. Therefore, combining categories at the low end of the scale may remove some

meaningless variation from the data. Statistical improvement would be reflected in larger values

of the F-statistic in the recoded data compared to the original data.

The recoding is defined as:

Instructions for Analyzing Data from CAHPS Surveys in SAS

Rating scale How often scale

Response value Recode Response value Recode

Option 1:

0 – 6 1 1 – 2 1

7 – 8 2 3 2

9 – 10 3 4 3

Option 2:

0 – 7 1

8 – 9 2

10 3

• A new parameter, KP_RESID, has been added to the macro call to allow the residual values

from the regression to be saved as a permanent SAS data set. By default, these values are only

saved temporarily while the macro is running.

Version 3.0-3.3 of the CAHPS SAS Analysis Program added the following enhancements and changes:

• The plan detail file, plandtal.dat, and the filename statement that assigns PLAN_DAT are

optional. If the plan detail file does not exist, then the macro uses the PLAN variable in the data

set called by the CAHPS macro. If used, the plan detail file must have a unique record for each

plan name or code. Only the first column is required; if the second column is missing, then the

macro creates dummy values for the new plan name equivalent to the first column. If the third

and fourth columns have missing values, then they are all set to the value of 1. Each column must

be separated by spaces.

• The Analysis Program removes any plans that are to be analyzed that have only zero or one

usable records. These changes were made in the submacro USABLE. The plans that are dropped

by the macro are saved in a permanent SAS data set labeled dp&outname.

• The CHILD variable is optional. If it does not exist, then the macro creates the variable CHILD.

If the ADULTKID parameter is set to 2, then the macro assumes all records in the analysis data

set are child records and sets CHILD = 1, otherwise CHILD will be set to 0, indicating there are

no child records. If there is a mix of child and adult records in the data set, the user must set up a

variable named CHILD and set it equal to 1 for child records and some other value, usually 0 for

adult records. Version 3.3 of the CAHPS macro corrects a logic error found in version 3.2 of the

macro.

• The EVEN_WGT parameter can apply individual level weights to the composite measure items.

This third option is activated by setting EVEN_WGT=2 and uses the weight variable, referenced

by WGTRESP.

• The variance of the mean variable, vp, was added to the text output of the adjusted mean report.

• A CAHPS version label was added to the permanent data sets to indicate which version of the

CAHPS Analysis Program created the data set. The version number was also added to the text

output.

• Users can case-mix the triple-stacked bar frequencies, using the ADJ_BARS parameter, and

include both the non-case-mixed frequencies with the case-mixed frequencies in the final

frequency output data set, n_*. For variables of type 5 (vartype = 5), these cannot have case-

Instructions for Analyzing Data from CAHPS Surveys in SAS

mixed bars since the frequencies for the response values are not aggregated into three bars. To

make this work for nonstandard variable types, it is best to do some recoding first to make the

three desired ranges and then run the new variable through as a vartype = 4.

The following parameters were added:

• The parameter ID_RESP stores the original respondent ID value, if one exists, in the permanent

data sets. If there is a unique variable in the data set that identifies each respondent, then enter the

variable name in this parameter. The macro carries it through the individual data sets and attaches

it to the residual data set if KP_RESID = 1 so the data set can be easily linked to the original if

needed. If no ID variable is entered, then the ID_RESP variable in the macro is set to ‘.z’. The

variable will be a character and have a maximum of 50 characters.

• The parameter flag OUTREGRE indicates whether or not the regression output should appear in

the text output file. If set to 0, the default, then the SAS printed output from the regressions in the

case-mix procedure is not printed out into the output file. If set to 1, then the regression output

appears.

• The parameter WGTRESP accepts the variable name that contains the weights for individual

respondents. This weight is used in the case-mix adjustment regression procedure.

• The parameter WGTMEAN accepts a variable that contains the weights to be applied to the

means of the plans before the case-mix adjustments are applied.

• The parameter SPLITFLG allows the data set to be split into two groups for the purpose of

centering the means differently and running two case-mix models through the macro. This was

done to deal with the Medicare Managed Care and Fee-for-Service analysis. By default, the

parameter is 0 and is not used but, if set to 1, then the data set must contain a variable with the

name SPLIT and must have the values of 0 and 1. Any record with a missing value is dropped

from the analysis.

• The parameter BAR_STAT stores the results of the case-mixed bars in permanent data sets with

the same format as the case-mixed survey question results. The new data sets created have the

format B#&outname and F#&outname where the B* files hold the stars and statistics by plan and

the F* files hold the overall means and statistics. The # has the values 1-3 for a normal macro run,

where 1 = the first bar frequency, 2 = the second bar frequency, and 3 = the third bar frequency if

it is not dichotomous. &outname is the value given in the macro call parameter OUTNAME. If

the data are stratified and stratification weights are used by having the macro parameter

WGTDATA = 2, up to six additional files are created with # having the values A-C, where A =

the first bar frequency of the combined strata, B = the second bar frequency of the combined

strata, and C = the third bar frequency of the combined strata.

• Version 3.3 corrects a logic error, contained within version 3.2, that occurred when the parameter

SUBSET = 3, which runs the macro multiple times based on the subsetting variable in the plan

detail file referenced by the FILENAME PLANDTAL statement.

• The text output on the Warnings and Parameter Info page contains more accurate information

about the adjusters when there are child interactions, when ADULTKID = 1. The number of

adjusters will reflect the original adjuster variables times 2 plus 1, so if there are originally 2

adjusters, the total number of adjusters with child interactions will be 5, ADJ#1, ADJ#2, ADJ#1 *

CHILD, ADJ#2 * CHILD, and CHILD.

Instructions for Analyzing Data from CAHPS Surveys in SAS

• Two flag lines added to the log file indicate if the macro finds the CHILD and PLAN variables in

the original analysis data set. If there is no child variable, the flag indicates how the macro

created a new CHILD variable.

Version 3.4 (May-June 2003) of the CAHPS SAS Analysis Program added the following enhancements

and changes:

• Added three additional variables to the sa* data set and the output text of the statistical tests. The

unweighted, unadjusted plan mean was added to help clarify what the unadjusted mean actually

is. Only when the wgtmean parameter is used will the unweighted, unadjusted mean be different

from the weighted unadjusted mean. The other variable added is the 95% Confidence Limits for

the Difference of the Mean. This is computed as 1.96 * the standard error of the difference. When

wgtplan = 1, then a third column containing the summed weights for each plan will also be added

to the sa* data set, the b* data set if frequency bars are to be stored (bar_stat = 1) and the output

text.

• Added in the weighted, unadjusted frequencies to the frequency table n_* data set and the output

text, when the frequency bars are also case-mix adjusted.

• Expanded the purpose of the wgtmean parameter to allow the use of the sum of the weights to the

plan level to be used in the comparison of the plan means. If a variable exists for the wgtmean

parameter, then the individual record level weight is used to compute the weighted, unadjusted

plan means. In addition, if the new parameter wgtplan = 1, then the sum of the individual weights

to the plan level will be used in weighting the plan mean comparisons. The wgtplan parameter

can have the value of 0, default, or 1. When 0, the macro will use equal weights when comparing

the plan means. When 1, and the wgtmean parameter has a variable listed, then the sum of the

weights to the plan level will be used computing the overall and grand means which are used in

the statistical comparisons of the plan means.

• Added checks on the DATASET parameter to make sure it exists or that the value in the

DATASET parameter is a valid SAS data set. If there is an error, the macro will stop processing

and print an error message to the log file.

• Added error checking on the merging of the plan detail file with the analysis data set. If there are

no records matching, then the macro will print out the frequencies of the unique PLAN values for

both the plan detail file and the analysis data set to the output text file and also print out and error

to the log file.

Version 3.5 (September 2005) of the CAHPS SAS Analysis Program added the following enhancements

and changes:

• A disclaimer and copyright statement were added.

• If weights are being used for the individual or plan means, records with weights that are less than

zero or missing are removed.

• When macro converts the numeric plan in allcases to character, it left justifies and trims trailing

blanks.

• The macro checks that there are plans in all subcodes after the usable data set is made. If some

subcodes have all missing plans, it recomputes how the subcodes are used in the looping in the

star macro.

Instructions for Analyzing Data from CAHPS Surveys in SAS

• The log comment for when child variable is not found in the original data set was changed.

• A bug was identified in the CAHPS 3.4b macro: Two lines that have length planname $ 20 when

it should be $ 40 causing a merge problem with the N_* data sets. $ 20 was changed to $ 40.

Version 3.6 (April 2006) of the CAHPS SAS Analysis Program added the following enhancements and

changes:

• This new version corrects an error in some previous versions affecting calculation of the

variances for the comparison of an entity mean to the mean of all other plan means, when the

plans were weighted. This error only affects analyses with parameter wgtplan=1 using CAHPS

macro versions 3.4b (released May 2003) and 3.5 (released September 2005). By default, the

macro sets wgtplan=0 so the error does not affect unweighted plan analysis.

• The error caused significance tests to be calculated incorrectly when determining whether an

entity's mean was significantly above or below the average. This could cause some plans to be

declared 1- or 3-star plans when they were respectively below or above average, but not by a

statistically significant amount.

• (July 2006) Modified formula for special case of using only one plan or entity and a division by

zero error may occur. This case used to work in prior versions. Modified code for checking if SE

may be missing to set T=0 in that case. Also, VO can now have a zero denominator, in the case

where there is only one entity being analyzed, modified code to catch that error.

• (3.6b as of June 2007) This modification to Version 3.6 puts the _wgtmean variable in the strata

data step in order to address a problem with a missing line that was not keeping the _WGTPLAN

variable in the data step that created wstemp. Because of the missing line, the use of wgtdata=2

for combining strata generated a SAS error.

Version 4.0 (September 2011) and 4.1 (April 2012) of the CAHPS SAS Analysis Program added the

following enhancements and changes:

• One part of the code that creates plandtal data set (it is in usable macro program) was modified.

This only affects when subset = 3.

• The calculation of weights for the composite measure items was modified. The sum of weights

based on the number of responses from each item is used as the weight of the composite measure

case. Also, the calculation of item weights for even_wgt = 1 was modified. For more details about

how the weights are computed, please see the Explanation of Statistical Calculation section.

• A new warning note was added in the macro output (it is in . mkreport macro program). The note

lists plan IDs when they have zero responses in measured items. A new option of assigning

smoothing variances was added. Users can assign a weight parameter called smoothing on the

variances as option. The default is smoothing = 0. This provides the original variances. If

smoothing is greater than zero, the value that users input will be used as the weight for the

variances. If smoothing is less than zero, the weight will be computed inside of the macro

automatically. For more details about how that weight is computed inside of the macro, please see

the Explanation of Statistical Calculation section.

• A SAS procedure PROC STANDARD was replaced with PROC STDIZE. The macro centers all

adjusters before it runs regression procedure if adjusters are required. PROC STANDARD was

Instructions for Analyzing Data from CAHPS Surveys in SAS

not applicable when some adjusters contain only the same values. As a result, it did not

standardize the value correctly. PROC STDIZE is able to handle the situation.

• (April 2012) Modified codes for computing adjusted composite measure means when composite

measure even weight option (even_wgt = 1) is selected. The macro computes the weights for all

entities regardless of the sample size. In the prior version, this caused incorrect adjusted means

when some entities did not make it to the final analysis due to the sample size. Thereby the

weights can be assigned differently in each item depending on the value K. Users can assign the

least responses in each composite measure item called K. . Version 4.1 is able to handle the case

and provide appropriate adjusted means.

Version 5.0 (November 2016) of the CAHPS SAS Analysis Program added the following enhancements

and changes:

• (November 2016) Updated weighted variance estimation. This update to the CAHPS macro

corrects an error in the previous version, which failed to take differential weighting at the

individual level into account in variance estimation. Mean scores were not affected by this

update.

• (February 2017) Added new weight option for calculating of case-mix regression coefficients

(WT_TYPE). One option (WT_TYPE = 1) is to weight equally by entity. The other option

(WT_TYPE = 2) is used when population-weighted regression coefficients are of interest. The

default (WT_TYPE = 0) is to weight by number of respondents.

• (February 2017) Added weighted overall mean option (OVERALL_WT). OVERALL_WT = 0

uses number of respondents, OVERALL_WT = 1 assigns equal weight, and OVERALL_WT = 2

uses the plan weight assigned in &WGTPLAN, which is the default in the program.

• (February 2017) Modified the default value of suppressing results of regression models

(OUTREGRE). The default value is updated to be 1 instead of 0.

• (February 2017) Added VARDEF option to PROC MEANS. This affects the calculation of

weighted standard deviation.

• (February 2017) Added PROC SURVEYREG as one of the regression options (PROC_TYPE).

Users can select either PROC REG or PROC SURVEYREG. The SURVEYREG procedure is

designed to handle complex survey sample designs, and one of the designs, clustering option, was

added to the version 5.0. Selecting PROC_TYPE = 1 performs PROC SURVEYREG. The

default is PROC_TYPE = 0, which is PROC REG used for the regression.

• (February 2017) Updated the content of the case-mix regression coefficients output file

(C_&OUTNAME). The output contains the standard errors, the p-values for each of the case-mix

estimates.

• (February 2018) Modified the composite measure calculation. Each overall mean of the items

gets computed first before combining the composite measure mean.

• (January 2019) Modified the Test SAS data set to change from the Health Plan Survey version

4.0 to the Clinician & Group Survey version 3.0.

• (June 2020) Modified all test programs to correspond with the updated Test SAS data, and

clarified instructions. Numbered programs to delineate which should be run first and separated

the steps required to prepare the data for analysis from the macro call statements.