The Critics of the Manning Hypothesis — Voracek, Berenbaum, and Hönekopp
Popular content often presents 2D:4D as a "scientifically established hormone marker." The actual academic landscape is far more divided. Since Manning's 1998 paper, several researchers have repeatedly challenged the hypothesis while engaging directly with its data. This article surveys the most influential critics — Martin Voracek, Sheri Berenbaum, and Hönekopp & Watson — and the points they raise.
1. What is being challenged?
Simplified, Manning's hypothesis says: (1) 2D:4D correlates with prenatal testosterone exposure, (2) measuring an adult's 2D:4D therefore provides an indirect window into their fetal hormonal environment, and (3) that environment links to personality, athletic ability, cognitive style, sexual orientation, and so on.
Critics mostly attack (2) and (3): is the signal really strong enough to license those inferences?
2. Martin Voracek — the most systematic skeptic
Martin Voracek at the University of Vienna has been the most consistent critic. He performs his own 2D:4D research but raises a recurring set of methodological concerns:
- Publication bias. 2D:4D studies tend to publish only significant findings. Many null results likely sit unpublished in file drawers.
- Effect-size inflation. Smaller samples show larger effect sizes (funnel-plot asymmetry), a pattern typical when chance variation outweighs the underlying signal.
- Multiple comparisons. When dozens of personality and behavioral variables are tested in the same dataset, a few will reach significance by chance — and those tend to be the ones that get reported.
Voracek's stance is, roughly: "2D:4D itself is an interesting indicator, but a substantial fraction of the published behavioral correlations are not trustworthy."
3. Hönekopp & Watson — effect sizes are small
The 2010 meta-analysis by Hönekopp & Watson is the most frequently cited comprehensive review. Its conclusion is double-edged:
- What it supports: The sex difference in 2D:4D is real. The right-hand effect size of about d ≈ 0.5 replicates consistently.
- What it criticizes: Behavioral and personality correlations show much smaller effect sizes (d ≈ 0.1–0.2). At that level, statistical significance does not translate into useful individual-level prediction.
This paper helped shape the field's balanced consensus: the sex difference is real, but extending the inference to individual hormone signals is overreach.
4. Sheri Berenbaum — skepticism from a hormone-development specialist
Sheri Berenbaum (Penn State) studies prenatal androgen effects directly through congenital adrenal hyperplasia (CAH), a condition that exposes fetuses to abnormally high androgens. If 2D:4D were a reliable hormone marker, CAH females should show a clear shift.
Her work confirms that CAH females show slightly lower 2D:4D, but the effect is small. Her conclusion: 2D:4D is somewhat related to prenatal hormones, but variability among individuals exposed to similar hormone environments is large enough that 2D:4D cannot be used to estimate any one person's prenatal hormonal environment with accuracy.
5. Measurement problems
Another line of critique concerns measurement reliability itself. Independent raters measuring the same hand typically vary by 0.01–0.02, comparable to the average sex difference. Self-measurement and photo-based methods produce even more variation. Voracek frames this as a low signal-to-noise ratio: when noise approaches the size of the effect, small reported effects are hard to take seriously.
6. Limits of causal inference
The 2D:4D – prenatal-hormone link is reasonably clear in animal models (rodent experiments). In humans, however, prenatal hormone exposure cannot be measured directly, so every link rests on indirect evidence. Hormone-development researchers like Berenbaum and Wallen (2009) repeatedly stress this distinction: showing that 2D:4D correlates with prenatal hormones is not the same as showing the correlation is strong enough to predict individual hormonal environments.
7. Are the critics simply rejecting it all?
Important balance: none of these scholars calls 2D:4D outright pseudoscience. The sex difference and some between-population differences are accepted. What they reject is the extended reading — that 2D:4D can read out a person's personality, fate, or identity. The distinction is small but decisive.
8. The current consensus
As of 2025, the field's shared view is roughly:
- The 2D:4D sex difference is real, with a moderate effect size.
- It is somewhat tied to prenatal hormones, but not a precise individual-level marker.
- Most personality and behavior correlations are small in magnitude, and several have failed to replicate.
- Strong popular claims of the form "2D:4D reveals your personality" are not the academic consensus.
The critics' message is essentially: an interesting indicator, but one to handle with humility. Please weigh the result of this service in roughly the same spirit.
Key References
- Hönekopp J, Watson S (2010). Meta-analysis of digit ratio 2D:4D shows greater sex difference in the right hand. American Journal of Human Biology, 22(5).
- Voracek M, Loibl LM (2009). Scientometric analysis and bibliography of digit ratio (2D:4D) research. Psychological Reports, 104.
- Berenbaum SA et al. (2009). Fingers as a marker of prenatal androgen exposure. Endocrinology, 150(11).
- Wallen K (2009). Does finger fat produce sex differences in second to fourth digit ratios? Endocrinology, 150(11).