The Critics of the Manning Hypothesis — Voracek, Berenbaum, and Hönekopp

2026-04-27 9 min read

Popular content often presents 2D:4D as a "scientifically established hormone marker." The actual academic landscape is far more divided. Since Manning's 1998 paper, several researchers have repeatedly challenged the hypothesis while engaging directly with its data. This article surveys the most influential critics — Martin Voracek, Sheri Berenbaum, and Hönekopp & Watson — and the points they raise.

1. What is being challenged?

Simplified, Manning's hypothesis says: (1) 2D:4D correlates with prenatal testosterone exposure, (2) measuring an adult's 2D:4D therefore provides an indirect window into their fetal hormonal environment, and (3) that environment links to personality, athletic ability, cognitive style, sexual orientation, and so on.

Critics mostly attack (2) and (3): is the signal really strong enough to license those inferences?

2. Martin Voracek — the most systematic skeptic

Martin Voracek at the University of Vienna has been the most consistent critic. He performs his own 2D:4D research but raises a recurring set of methodological concerns:

Voracek's stance is, roughly: "2D:4D itself is an interesting indicator, but a substantial fraction of the published behavioral correlations are not trustworthy."

3. Hönekopp & Watson — effect sizes are small

The 2010 meta-analysis by Hönekopp & Watson is the most frequently cited comprehensive review. Its conclusion is double-edged:

This paper helped shape the field's balanced consensus: the sex difference is real, but extending the inference to individual hormone signals is overreach.

4. Sheri Berenbaum — skepticism from a hormone-development specialist

Sheri Berenbaum (Penn State) studies prenatal androgen effects directly through congenital adrenal hyperplasia (CAH), a condition that exposes fetuses to abnormally high androgens. If 2D:4D were a reliable hormone marker, CAH females should show a clear shift.

Her work confirms that CAH females show slightly lower 2D:4D, but the effect is small. Her conclusion: 2D:4D is somewhat related to prenatal hormones, but variability among individuals exposed to similar hormone environments is large enough that 2D:4D cannot be used to estimate any one person's prenatal hormonal environment with accuracy.

5. Measurement problems

Another line of critique concerns measurement reliability itself. Independent raters measuring the same hand typically vary by 0.01–0.02, comparable to the average sex difference. Self-measurement and photo-based methods produce even more variation. Voracek frames this as a low signal-to-noise ratio: when noise approaches the size of the effect, small reported effects are hard to take seriously.

6. Limits of causal inference

The 2D:4D – prenatal-hormone link is reasonably clear in animal models (rodent experiments). In humans, however, prenatal hormone exposure cannot be measured directly, so every link rests on indirect evidence. Hormone-development researchers like Berenbaum and Wallen (2009) repeatedly stress this distinction: showing that 2D:4D correlates with prenatal hormones is not the same as showing the correlation is strong enough to predict individual hormonal environments.

7. Are the critics simply rejecting it all?

Important balance: none of these scholars calls 2D:4D outright pseudoscience. The sex difference and some between-population differences are accepted. What they reject is the extended reading — that 2D:4D can read out a person's personality, fate, or identity. The distinction is small but decisive.

8. The current consensus

As of 2025, the field's shared view is roughly:

  1. The 2D:4D sex difference is real, with a moderate effect size.
  2. It is somewhat tied to prenatal hormones, but not a precise individual-level marker.
  3. Most personality and behavior correlations are small in magnitude, and several have failed to replicate.
  4. Strong popular claims of the form "2D:4D reveals your personality" are not the academic consensus.

The critics' message is essentially: an interesting indicator, but one to handle with humility. Please weigh the result of this service in roughly the same spirit.

Key References

Take the test →
← Back to blog