Science Correcting Itself | Correcting Errors
Correcting Errors
Correction is an important element of scientific progress. In the history of scientific investigations, it has been common for widely-accepted ideas to be overturned. Literature contains many instances of such self-correction, though the corrections are sometimes not as widely known as the original findings. Here, we catalog some corrections, all of which are instances science can be proud of. Investigators are invited to submit to us more such examples to be listed here and to illustrate more instances in which science was successfully self-correcting.
Attitude Importance and Attitude Accessibility
Roese and Olson argue that attitude importance and accessibility are intrinsically linked and that accessibility indicates importance; if an attitude is easily accessible to an individual then it is also important to him or her. Bizer and Krosnick take issue with that assertion, demonstrating that, with regard to a particular attitude, either accessibility or importance can exist without the other. They also found that attitude importance does influence accessibility, but accessibility does not necessarily determine importance.
Original Article
Roese, N.J., & Olson, J.M. (1994). Attitude importance as a function of repeated attitude expression. Journal of Experimental Social Psychology, 30, 39-51.
Correction
Bizer, G.Y., & Krosnick, J. A. (2001). Exploring the structure of strength-related attitude features: Between attitude importance and attitude accessibility. Journal of Personality and Social Psychology, 81, 566-586.
Bystander Effect
In 1964, after the brutal murder of Catherine “Kitty” Genovese, the New York Times published an article reporting that thirty-seven of Genovese’s neighbors witnessed the attack and did nothing, failing to either intervene or call the police. In response, many researchers published articles on the Bystander Effect, claiming that the presence of multiple observers of a situation leads to a diffusion of responsibility, making each person less likely to involve him or herself in that situation, even if it means helping someone who needs it. In recent years, however, scholars have questioned the Bystander Effect as well the veracity of the Kitty Genovese story itself. In reality, very few neighbors witnessed her murder, and several people who were witnesses did call the police or attempt to help Genovese. Research has shown that in dangerous or life-threatening situations, bystanders are willing to help a victim even if they are part of a large group.
Original Articles
Darley, J.M., & Latané, B. (1968). Bystander intervention in emergencies: Diffusion of responsibility. Journal of Personality and Social Psychology, 8(4), 377-383.
Gansberg, M. (1964, March 27). 37 who saw murder didn’t call the police. The New York Times, pp. 1, 38.
Latané, B., & Darley, J.M. (1968). Group inhibition of bystander intervention in emergencies. Journal of Personality and Social Psychology,10(3), 215-221.
Latané, B., & Darley, J.M. (1969). Bystander “apathy.” American Scientist, 57(2), 244-268.
Latané, B., & Nida, S. (1981). Ten years of research on group size and helping. Psychological Bulletin, 89(2), 308-324.
Rosenthal, A.M. (1964, May 3). Study of the sickness called apathy. The New York Times, pp. 24, 66, 69, 70, 72).
Rosenthal, A.M. (1999). Thirty-eight witnesses: The Kitty Genovese case. Brooklyn, NY: Melville House Publishing.
Corrections
Cook, K. (2014). Kitty Genovese: The murder, the bystanders, the crime that changed America. New York, NY: W. W. Norton & Company.
Fischer, P., Krueger, J.I, Greitemeyer, T., Vogrincic, C., Kastenmüller, A., Frey, D., … Kainbacher, M. (2011). The bystander-effect: A meta-analytic review on bystander intervention in dangerous and non-dangerous emergencies. Psychological Bulletin, 137(4), 517-537.
Lemann, N. (2014, March 10). A call for help: What the Kitty Genovese story really means. The New Yorker, 73-77.
Manning, R., Levine, M., & Collins, A. (2007). The Kitty Genovese murder and the social psychology of helping: The parable of the 38 witnesses. American Psychologist, 62(6), 555-562.
Extraverts & Social Situations
Lucas and Diener (2001) examined whether extraverts’ greater enjoyment of social situations reflects a specific preference for social interaction or a broader sensitivity to pleasant experiences. Across three studies, participants rated a range of situations on both their sociality and pleasantness. The results showed that extraverts rated social situations more positively than introverts only when those situations were perceived as pleasant. Notably, the same pattern held for nonsocial situations, with extraverts reporting greater enjoyment than introverts when the situations were pleasant. These findings led the authors to conclude that sensitivity to reward and positive affect, rather than social interaction per se, may be more central to extraversion.
Two decades later, Lucas (2021) publicly reassessed these conclusions as part of the Loss-of-Confidence Project, reporting reduced confidence in the original findings due to methodological concerns. He identified small and inconsistently determined sample sizes across the three studies, as well as the exclusion of multiple participants based on outlier status or suspected misunderstanding of instructions, as key weaknesses that may have compromised the reliability of the results. Lucas emphasized the need for clearer procedures, larger samples, and more consistent exclusion criteria
Original Article
Lucas, R. E., & Diener, E. (2001). Understanding extraverts' enjoyment of social situations: The importance of pleasantness. Journal of Personality and Social Psychology, 81(2), 343–356.
Correction
Lucas, R. E., (2021). Putting the Self in Self-Correction: Findings From the Loss-of-Confidence Project. Perspectives on Psychological Science, 16(6), 1255–1269.
Gateway Belief Model
The Gateway Belief Model (GBM) describes a process of attitudinal change where a shift in people’s perception of the scientific consensus on an issue leads to subsequent changes in their attitudes which in turn predict changes in support for public action. According to the original study called,”The Scientific Consensus on Climate Change as a Gateway Belief: Experimental Evidence,” by van der Linden and others, advising study subjects that “97% of climate scientists have concluded that human-caused climate change is happening” induces subjects to revise upward their own estimate of the proportion of scientists who subscribe to this position.
The GBM has been contested by many other researchers since the original study’s publication. One of the most prominent contesters is by Kahan in his work, “The ‘Gateway Belief’ Illusion: Reanalyzing the Results of a Scientific Consensus Messaging Study.” Kahan insists that “the point of [his] paper was not to determine which position is correct on the use of consensus messaging. It was only to assure that scholars would have access to all the data collected” in regards to the work by van der Linden and his colleagues.
Another study by Kerr and Wilson also contests the original study supporting the GBM. In “Perceptions of Scientific Consensus Do Not Predict Later Beliefs about the Reality of Climate Change: A Test of the Gateway Belief Model Using Cross-lagged Panel Analysis,” Kerr and Wilson find the opposite results then the original study, positing “results suggest that individuals’ perceptions of a consensus among scientists do not have a strong influence on their personal beliefs about climate change.”
While van der Linden and his colleagues have subsequently done large scale replications of the original study in “The gateway belief model: A large-scale replication,” many researchers continue to disconfirm the GBM and view it as an ‘illusion.’
Original Articles
van der Linden, S.L., Leiserowitz, A.A., Feinberg, G.D., & Maibach, E.W. (2015). “The Scientific Consensus on Climate Change as a Gateway Belief: Experimental Evidence.” PLOS One.
Correction
Kahan, D.M. (2017). “The ‘Gateway Belief’ Illusion: Reanalyzing the Results of a Scientific consensus Messaging Study.” Journal of Science Communication 16(5): 1—20.
Kerr, J.R. & Wilson, M.S. (2018). “Perceptions of Scientific Consensus Do Not Predict Later Beliefs about the Reality of Climate Change: A Test of the Gateway Belief Model Using Cross-lagged Panel Analysis.” Journal of Environmental Psychology 59: 107-110.
van der Linden, S.L., Leiserowitz, A.A., Feinberg, G.D., & Maibach, E.W. (2019). “The gateway belief model: A large-scale replication.” Journal of Environmental Psychology 62.
van der Linden, S.L., Leiserowitz, A.A., Feinberg, G.D., & Maibach, E.W. (2017). “Gateway Illusion or Cultural Cognition Confusion?” Journal of Science Communication 16(5).
Media Coverage
Leber, R. (2015). “Meet the 97 Percent Climate Truthers.” The New Republic
Mooney, C. (2019). “Researchers Think They’ve Found a ‘Gateway Belief’ That Leads to Greater Science Acceptance.” The Washington Post, WP Company.
van der Linden, S. (2015). “How to Combat Distrust of Science.” Scientific American, Scientific American.
Mechanisms of Semantic Priming
Heyman and colleagues (2015) investigated whether semantic priming reflects automatic, capacity-free processing or instead depends on cognitive resources. Using a lexical decision task under varying levels of working memory load, the authors compared three types of word pairs: symmetrically associated pairs (e.g., answer–question), forward asymmetrically associated pairs in which the prime predicts the target (e.g., panda–bear), and backward asymmetrically associated pairs in which the association runs from target to prime (e.g., ball–catch). They found that symmetric pairs showed priming in both directions, whereas forward asymmetric pairs showed priming only from prime to target. Based on these patterns, the authors concluded that prospective processes involved in forward priming require cognitive resources, while retrospective processes underlying symmetric priming operate more automatically and with minimal capacity demands.
In a later statement as part of the Loss-of-Confidence Project, Heyman (2021) expressed reduced confidence in these conclusions due to concerns about the experimental manipulation used to distinguish prospective from retrospective processes. He noted that the three stimulus types differed on several dimensions beyond their presumed associative structure, introducing confounding variables that may have influenced the results. As a consequence, the evidence supporting the claim that certain forms of semantic priming are automatic and capacity-free was weakened. Heyman concluded that the findings should be interpreted with caution and emphasized the need for further work using more tightly controlled designs to clarify the mechanisms underlying semantic priming.
Original Article
Heyman, T., Van Rensbergen, B., Storms, G., Hutchison, K. A., & De Deyne, S. (2015). The influence of working memory load on semantic priming. Journal of Experimental Psychology: Learning, Memory, and Cognition, 41(3), 911–920.
Correction
Heyman T. Putting the Self in Self-Correction: Findings From the Loss-of-Confidence Project. Perspect Psychol Sci. 2021 Nov;16(6):1255-1269.
Mixed Stereotypes
This work examined whether mixed stereotypes, in which a social group is perceived as high on one dimension such as warmth but low on another such as competence, can be measured using the Implicit Association Test (IAT). Carlsson and Björklund (2010) reported that they had developed a novel IAT-based method capable of capturing such mixed stereotype content, arguing that implicit measures could disentangle multiple evaluative dimensions simultaneously. Their findings were presented as evidence that the IAT could move beyond simple positive–negative associations to assess more complex stereotype structures.
More than a decade later, Carlsson (2021) publicly reassessed the study’s conclusions as part of the Loss-of-Confidence Project and reported substantially reduced confidence in the original claims. He identified several methodological and analytic issues that undermined the validity of the findings, including flexibility in hypothesis testing and post hoc analytic decisions. Notably, the statistical approach used in the second study differed from that of the first and was introduced after peer review, raising concerns about researcher degrees of freedom. Carlsson also acknowledged inappropriate interpretation of nonsignificant effects as evidence for the null hypothesis despite low statistical power. Additional concerns included the omission of a third IAT measuring general group attitudes that was not disclosed in the original manuscript, as well as the collection and exclusion of an undisclosed behavioral measure based on subjective judgments during debriefing.
Original Article
Carlsson R., Björklund F. (2010). Implicit stereotype content: Mixed stereotypes can be measured with the implicit association test. Social Psychology, 41, 213–222.
Correction
Carlsson, R., (2021). Putting the Self in Self-Correction: Findings From the Loss-of-Confidence Project. Perspectives on Psychological Science, 16(6), 1255–1269.
Motor Simulation and Depth Perception
Witt and Proffitt (2008) proposed that distance perception is shaped by anticipated action through motor simulation, such that intending or imagining an action can alter how far objects appear. In a series of experiments, participants estimated distances to targets that were out of reach but reachable with a tool. Participants who anticipated or imagined using a tool perceived targets as closer, whereas imagining an impossible action had no effect on perception. Interference experiments showed that disrupting motor simulation by squeezing a rubber ball while reaching eliminated the distance compression effect, supporting the role of motor simulation in perception.
In a later self-correction, Witt (2021) reported reduced confidence in these findings due to small sample sizes and inappropriate statistical analyses that treated repeated distance estimates as independent observations. When reanalyzed using proper methods, several experiments produced null or mixed results. Witt concluded that the original evidence for motor simulation influencing distance perception was less robust than initially claimed.
Original Article
Witt, J. K., & Proffitt, D. R. (2008). Action-specific influences on distance perception: A role for motor simulation. Journal of Experimental Psychology: Human Perception and Performance, 34(6), 1479–1492.
Correction
Witt JK. Putting the Self in Self-Correction: Findings From the Loss-of-Confidence Project. Perspect Psychol Sci. 2021 Nov;16(6):1255-1269.
Name-Order Effect
The Name-Order Effect refers to the theory that political candidates whose names are listed first on the ballot are given an unfair advantage, as they receive more votes than they would have if they were not listed first. Ho and Imai argue that the Name-Order effect is largely insignificant, mostly holding true only for minor party candidates and in nonpartisan elections. Alvarez et al also emphasize that the Name-Order Effect is small where it exists. In some cases, they find that there is actually a negative effect on candidates’ vote shares if they are listed first or last on the ballot. Pasek et al, however, reaffirm the significance of the Name-Order Effect, arguing that it is sizable enough to impact elections.
Original Articles
Ho, D.E., & Imai K. (2008). Estimating causal effects of ballot order from a randomized natural experiment: The California Alphabet Lottery, 178-2002. Public Opinion Quarterly, 72 (2), 216-240.
Ho, D.E., & Imai K. (2006). Randomization inference with natural experiments: An analysis of ballot effects in the 2003 California recall election. Journal of the American Statistical Association, 101 (475), 888-900.
Alvarez, R.M., Sinclair, B., & Hasen R.L. (2006). How much is enough? The “Ballot Order Effect” and the use of social science research in election law disputes. Election Law Journal, 5 (1), 40-56.
Correction
Pasek, J., Schneider, D., Krosnick, J.A., Tahk, A., Ophir, E., & Milligan, C. (2014). Prevalence and moderators of the candidate Name-Order Effect: Evidence from statewide general elections in California. Public Opinion Quarterly, 78 (2), 416-439.
Names & Career Outcomes
Silberzahn and Uhlmann (2013) reported that Germans with “noble-sounding” surnames were more likely to hold managerial positions than those with common surnames, based on an analysis of approximately 200,000 professionals. They argued that this pattern persisted after accounting for name frequency, suggesting that surname meaning might influence career outcomes.
These findings were later overturned by a reanalysis led by Simonsohn (2014), who showed that the original study relied on an incorrect assumption about the relationship between surname frequency and managerial status. When a matched-names analysis was used to properly control for name frequency, no association between surname meaning and career outcomes remained. In a later statement, Silberzahn (2021) acknowledged that the original conclusion was unsupported and emphasized the importance of analytic transparency and independent reanalysis when evaluating surprising findings.
Original Article
Silberzahn R., Uhlmann E. L. (2013). It pays to be Herr Kaiser: Germans with noble-sounding surnames more often work as managers than as employees. Psychological Science, 24, 2437–2444.
Correction
Silberzahn, R., Simonsohn, U., & Uhlmann, E. L. (2014). Matched-Names Analysis Reveals No Evidence of Name-Meaning Effects: A Collaborative Commentary on Silberzahn and Uhlmann (2013). Psychological Science, 25(7), 1504–1505.
Author Response
Silberzahn R. Putting the Self in Self-Correction: Findings From the Loss-of-Confidence Project. Perspect Psychol Sci. 2021 Nov;16(6):1255-1269.
Noise & Hearing Effort
In this study, Brown and Strand (2019) tested whether the presence of a visual stimulus would change the effort required to recognize spoken words in background noise. Participants completed a word recognition task while listening to speech masked by noise. In one condition, the speech was accompanied by a visually animated circle; in the other, no visual stimulus was present. The authors initially reported that participants responded more quickly when the visual stimulus was present and interpreted this as evidence that the visual cue reduced listening effort, independent of working memory capacity.
In a subsequent correction, Strand, Brown, and Barbour (2020) identified an error in the stimulus presentation program that unintentionally slowed response times in the no-visual condition. When this error was corrected and the data were reanalyzed, the pattern reversed: the visual stimulus increased listening effort and did not improve speech recognition accuracy, despite participants reporting that the task felt easier.
Original Article
Brown, V. A., & Strand, J. F. (2019). Noise increases listening effort in normal-hearing young adults, regardless of working memory capacity. Language, Cognition and Neuroscience, 34(5), 628–640.
Correction
Strand, J. F., Brown, V. A., & Barbour, D. L. (2020). Talking points: A modulating circle increases listening effort without improving speech recognition in young adults. Psychonomic Bulletin & Review, 27, 536-543.
Power Posing
Power posing refers to the claim that briefly adopting expansive, high-power body postures can causally influence psychological states, hormone levels, and risk-taking behavior. In their original study, Carney, Cuddy, and Yap (2010) reported that participants who held expansive postures for short periods showed increased testosterone, decreased cortisol, and greater financial risk tolerance compared to those holding contractive postures. These findings were interpreted as evidence that nonverbal displays of power can rapidly alter both neuroendocrine functioning and behavior, and the study received widespread attention in both academic and popular media.
Subsequent research raised substantial doubts about the robustness of these effects. A high-powered direct replication by Ranehill et al. (2015), using similar procedures and a much larger sample, failed to find any effects of power posing on testosterone, cortisol, or risk-taking behavior, though participants did report feeling more powerful subjectively. In response, Carney and colleagues (2015) published a comprehensive review of 33 studies involving over 2,500 participants, concluding that while expansive postures reliably influence self-reported feelings of power, evidence for hormonal or behavioral effects is inconsistent or absent. One year later, Carney (2016) publicly revised her position, identifying multiple methodological weaknesses in the original research, including small sample sizes, weak and selectively reported effects, researcher degrees of freedom, potential experimenter bias, and inadequate control of confounding variables. She ultimately concluded that power pose effects are not real and discouraged further research on the topic, framing the case as an example of how cumulative evidence can overturn initially compelling findings.
Original Article
Carney, D. R., Cuddy, A. J., & Yap, A. J. (2010). Power posing: Brief nonverbal displays affect neuroendocrine levels and risk tolerance. Psychological science, 21(10), 1363-1368.
Replication Attempt
Ranehill, E., Dreber, A., Johannesson, M., Leiberg, S., Sul, S., & Weber, R. A. (2015). Assessing the robustness of power posing: No effect on hormones and risk tolerance in a large sample of men and women. Psychological science, 26(5), 653-656.
Author Response
Carney, Dana R., et al. “Review and Summary of Research on the Embodied Effects of Expansive (vs. Contractive) Nonverbal Displays.” Psychological Science, vol. 26, no. 5, 2015, pp. 657–663
Author Reversal
Carney, D. (n.d.). My position on “Power poses” - University of California, Berkeley. Haas School of Business. 2016.
Shooter Video Games and Effect on Firing Aim and Accuracy
Whitaker and Bushman argue that playing violent video games with a pistol-shaped control can increase firing accuracies. This paper has been retracted by Communication Research due to irregularities in some variables in the data set. A replication of the study by Dr. Bushman is in review.
Original Article
Whitaker, J.L. & Bushman, B.J. (2012). “Boom, Headshot!” Effect of Video Game Play and Controller Type on Firing Aim and Accuracy Communication Research, 41(7), 879-891.
Correction
(2016). Dispute over shooter video games may kill recent paper
Retraction Watch
Loss of Confidence Project
The Loss of Confidence Project was created to make it easier for researchers to publicly acknowledge when they no longer have confidence in the main findings of their own published psychological studies. The project focuses on cases in which researchers experience a clear shift in belief about the validity of a central result after identifying theoretical or methodological problems in their work. To be included, submitting authors had to take responsibility for the identified issues and explain why the original conclusions could no longer be supported.
The first report from the project summarized 13 loss-of-confidence statements spanning a wide range of psychological topics and research methods (Rohrer et al., 2021). The original articles varied widely in visibility and citation counts, suggesting that loss of confidence extends to high-impact publications. Reasons for withdrawing confidence fell into three primary categories: methodological errors, such as programming mistakes or misspecified models; invalid inferences, where conclusions were not justified by the reported analyses; and p-hacking or unrecognized analytic flexibility. The authors noted that while doubts about one’s own findings are relatively common, they are rarely shared publicly. By encouraging transparent disclosure, the project highlights how individual researchers’ self-corrections can contribute to improving the reliability of psychological science.
Project Overview
Loss of Confidence Project. https://lossofconfidencecom.wordpress.com/
First Findings
Rohrer, J. M., Tierney, W., Uhlmann, E. L., DeBruine, L. M., Heyman, T., Jones, B., Schmukle, S. C., Silberzahn, R., Willén, R. M., Carlsson, R., Lucas, R. E., Strand, J., Vazire, S., Witt, J. K., Zentall, T. R., Chabris, C. F., & Yarkoni, T. (2021). Putting the Self in Self-Correction: Findings From the Loss-of-Confidence Project. Perspectives on Psychological Science, 16(6), 1255–1269.
Therapy Ratings & Attachment Styles
Attachment-based models of psychotherapy propose that therapists’ and clients’ attachment styles shape how each perceives the therapeutic relationship, particularly the quality of the working alliance, a core predictor of treatment effectiveness. Drawing on this framework, O’Connor et al. (2019) tested whether attachment styles were associated with agreement between therapists’ and clients’ alliance ratings. Using hierarchical linear modeling on data from 158 clients and 27 therapists at a community clinic, the study found systematic disagreement in alliance ratings, with therapists generally rating the alliance less positively than clients. Greater agreement was observed when therapists reported lower attachment avoidance and when therapist and client attachment styles were more similar in levels of anxiety or avoidance.
The article was later retracted following an investigation by the University of Maryland Institutional Review Board, which determined that the dataset included data from clients who had not consented to its use for research purposes. Although O’Connor was not responsible for obtaining consent, the inclusion of unauthorized data invalidated the findings, leading the coauthors to request retraction. As a result, the study’s conclusions can no longer be considered reliable, and no IRB-approved replications have been conducted.
Original Article
O'Connor S, Kivlighan DM, Hill CE, Gelso CJ. Therapist-client agreement about their working alliance: Associations with attachment styles. J Couns Psychol. 2019 Jan;66(1):83-93. doi: 10.1037/cou0000303. Epub 2018 Aug 9. PMID: 30091622.
Retraction
Retraction of O'Connor et al. (2019) [retraction of: J Couns Psychol. 2019 Jan;66(1):83-93. doi: 10.1037/cou0000303.]. J Couns Psychol. 2023;70(4):449. doi:10.1037/cou0000676
Verbal Overshadowing
The verbal overshadowing effect refers to the finding that verbally describing a previously seen visual stimulus can impair subsequent recognition of that stimulus. First reported by Schooler and Engstler-Schooler (1990), the effect was demonstrated across a series of experiments in which participants viewed faces or other visual materials and were either asked to describe them verbally or not before completing a recognition task. Participants who engaged in verbal description consistently showed poorer recognition performance than those who only viewed the stimuli. The effect extended beyond faces to difficult-to-verbalize stimuli such as colors, leading the authors to propose the recoding interference hypothesis, whereby verbalization creates a linguistically biased representation that interferes with the original visual memory.
Despite strong initial findings, later work raised questions about the robustness of the effect. Schooler (2011) noted increasing difficulty replicating the original effect size, contributing to broader discussion of the “decline effect,” in which effect magnitudes weaken as studies are replicated over time. A meta-analysis by Meissner and Brigham (2001) found evidence for verbal overshadowing but reported a substantially smaller effect size, approximately a 12% reduction in recognition accuracy. More recently, a large preregistered, multi-lab direct replication found that all replication attempts produced smaller effect size estimates than the original study (Alogna et al., 2014), although many fell within the wide confidence interval of the original experiments. In response, Schooler (2014) identified a key procedural deviation in early replications involving the timing of verbalization, noting that verbalization occurred immediately after viewing rather than following the original 20-minute delay. When this timing was restored, a subsequent replication produced effect sizes comparable to the original findings.
Original Article
Schooler, J. W., & Engstler-Schooler, T. Y. (1990). Verbal overshadowing of visual memories: Some things are better left unsaid. Cognitive psychology, 22(1), 36-71.
Author Statement on Decline Effect
Schooler, J. Unpublished results hide the decline effect. Nature 470, 437 (2011).
Corrections
Meissner, C. A., & Brigham, J. C. (2001). A meta-analysis of the verbal overshadowing effect in face identification. Applied Cognitive Psychology, 15(6), 603–616.
Alogna, V. K., Attaya, M. K., Aucoin, P., Bahník, Š., Birch, S., Birt, A. R., Bornstein, B. H., ... (2014). Registered Replication Report: Schooler and Engstler-Schooler (1990). Perspectives on Psychological Science, 9(5), 556–578.
Author Response
Schooler, J. W. (2014). Turning the Lens of Science on Itself: Verbal Overshadowing, Replication, and Metascience. Perspectives on Psychological Science, 9(5), 579–584.
Women’s Preference for Attractive Makeup and Changes in Salivary Testosterone Levels
This study examined whether women’s preferences for attractive makeup varies with changes in their salivary testosterone levels. Fisher and colleagues (2015) reported that within-person fluctuations in testosterone were associated with preferences for more attractive makeup, suggesting that hormonal variation may influence appearance-related preferences. Published in Psychological Science, the findings were interpreted as evidence that endocrine factors shape women’s aesthetic judgments and self-presentation choices.
However, the study was later retracted after concerns were raised about its statistical modeling approach. Specifically, the original analyses did not include random slopes for key within-subject variables, such as makeup attractiveness and testosterone levels, which are necessary to account for individual differences in how predictors relate to outcomes. Omitting random slopes can substantially inflate false positive rates in multilevel models, particularly in repeated-measures designs. After reanalyzing the data with appropriate model specifications, the authors found that the primary effect no longer reached statistical significance. In response, the research team requested a retraction, emphasizing the importance of open data, shared analysis code, and post-publication scrutiny in identifying analytic errors and correcting the scientific record (Rohrer et al., 2021).
Original Article
Fisher, C. I., Hahn, A. C., DeBruine, L. M., & Jones, B. C. (2015). RETRACTED: Women’s Preference for Attractive Makeup Tracks Changes in Their Salivary Testosterone. Psychological Science, 26(12), 1958–1964.
Correction
Rohrer, J. M., Tierney, W., Uhlmann, E. L., DeBruine, L. M., Heyman, T., Jones, B., Schmukle, S. C., Silberzahn, R., Willén, R. M., Carlsson, R., Lucas, R. E., Strand, J., Vazire, S., Witt, J. K., Zentall, T. R., Chabris, C. F., & Yarkoni, T. (2021). Putting the Self in Self-Correction: Findings From the Loss-of-Confidence Project. Perspectives on Psychological Science, 16(6), 1255–1269.