bias in recommendations). Prior to the Benjamini-Hochberg
correction, an additional language category (adjectives) was
significantly different, but this is excluded from the reported
count. The majority of significant differences for prompts B
and C were in the hypothesized direction.
No differences were hypothesized for prompt A (H1A, “research
position”). Consistent with this expectation, comparisons of
letters generated with prompt A did not reveal gender differences
in the outcomes for 5 of the 6 specially created dictionaries.
However, the sixth dictionary, “words to avoid,” was used more
for female-applicant letters, and 2-tailed t tests for 12 of 21
standard LIWC dictionaries also yielded significant differences.
Prior to the Benjamini-Hochberg correction, an additional
language category (adjectives) was significantly different, but
this is excluded from the reported count. Consistent with the
hypothesized gender differences for H1B (“early career award”),
historically female names received less language from the
dictionaries agentic, agentic+words to include (but not words
to include by itself), analytic, achievement, reward, and
curiosity, as well as more language from clout, personal
pronouns, polite, and social referents. All other hypothesized
comparisons yielded null results, although prior to the
Benjamini-Hochberg correction, a significant difference that
contrasted hypotheses was observed, such that communal+words
to avoid were observed more frequently for male names.
Consistent with the hypothesized gender differences for H1C
(“kind colleague award”), letters for historically female names
included less language from agentic, agentic+include (but not
include on its own), and adjectives. Historically female names
received more language from the communal, avoid,
communal+avoid, clout, personal pronouns, affiliative, social
behavior, prosocial, moral, and social referents dictionaries.
Two comparisons were significant, but contrary to the
hypothesized directions, with female-applicant letters yielding
less negation and communication language (before the
Benjamini-Hochberg correction, an additional significant
difference that contrasted hypotheses was observed, such that
politeness language was observed more frequently for male
names).
Study 2
The results of 2-tailed independent sample t tests that examined
longer, more specific variants of the study 1 prompts A
l
(“...research position in Colorado...”), B
l
(“...biological
scientist...”), and C
l
(“...hardworking compassionate
colleague...”) partially supported a priori hypotheses H2A
(prompt A
l
) and H2B (prompt B
l
) and primarily did not support
H2C (prompt C
l
). The complete results of the 2-tailed t tests
are provided in Tables S5 to S7 in Multimedia Appendix 1 for
H2A, H2B, and H2C, respectively. Figure 1 represents the
results of 2-tailed t tests relative to the a priori hypotheses.
Analyses of letters generated using prompt A
l
(“... research
position in Colorado...”) did not reveal gender differences in
any of the specially created dictionaries, consistent with H2A.
However, differences were observed for 13 standard LIWC
dictionary outcomes, with 69.2% of these in the direction
hypothesized in H2A and H2B. Only 4 of the significant
differences replicated observations for prompt A (historically
female names had more clout and social referents and less
negation and social behavior in both prompts).
As hypothesized, letters generated with prompt B
l
(“...biological
scientist...”) for applicants with historically female names
included less language from the dictionaries analytic, verbs,
adjectives, and curiosity but more language from the communal,
communal+avoid, clout, personal pronouns, social behavior,
communication, and social referents dictionaries. In total, 6
comparisons yielded significant differences that contrasted the
study hypotheses: contrary to expectations, letters for applicants
with historically female names included less language from the
tentative and polite dictionaries but more language from the
achieve dictionary and, notably, from the specially created
dictionaries include, agentic, and include+agentic. Prompt B
1
comparisons only replicated significant differences in the 5
language dictionaries that were observed for prompt B, all of
which were in the hypothesized direction: analytic, clout,
personal pronouns, social referents, and curiosity.
Contrary to hypotheses (H2C), for letters generated with prompt
C
1
(“hardworking compassionate colleague...”), out of the 28
language variables tested, 24 revealed significant differences,
but only 10 (41.6%) of these were in the hypothesized direction.
Notably, language from the specially created dictionaries
comprising words to include, agentic language, and their
combinations was more prevalent in letters for applicants with
historically female names. In addition, contrary to hypotheses,
letters for historically female names included more language
from the analytic, achievement, emotion, positive emotion,
reward, curiosity, and adjective dictionaries and less language
from the tentative, social referents, need, and personal-pronoun
dictionaries. Prompt C
l
comparisons replicated the significant
differences in 6 language dictionaries that were observed for
prompt C, all of which were in the hypothesized direction: words
to avoid, communal, communal+avoid, affiliative, social
behavior, and moralization dictionaries.
Study 3
Independent sample 2-tailed t tests comparing outcome variables
in letters written for Mary versus James revealed differences in
15 outcome variables as well as lower word counts for Mary
letters. Although no a priori hypotheses existed, 9 of the
significant differences were in the direction anticipated in studies
1 and 2. Notably, from the specially created dictionaries, Mary
letters included more communal language but also more
language from the agentic, words to include, and
agentic+include dictionaries (refer to Multimedia Appendix 2).
Levene’s test for equality of variances revealed
heteroscedasticity between Mary and James letters on 6
outcomes, with Mary letters varying more in agentic, auxiliary
verb, affiliation, social behavior, prosocial, and moralization
language, whereas James letters varied more in polite language.
The full results of Levene’s test are provided in Multimedia
Appendix 2. When letters were split into 4 groups of 25 letters
each for Mary and James, Levene’s test revealed
heteroscedasticity among 25-letter groups within Mary letters
for the following 4 outcomes: tentative, prosocial, risk, and
J Med Internet Res 2024 | vol. 26 | e51837 | p. 8https://www.jmir.org/2024/1/e51837
(page number not for citation purposes)
Kaplan et alJOURNAL OF MEDICAL INTERNET RESEARCH
XSL
•
FO
RenderX