The Wanton Empiricist: Is the performance of trans athletes in the NCAA a good way to determine whether trans athletes have an advantage over cis women?

Recently Brynn Tannehill addressed (in a short Twitter stream) the question of whether trans athletes competing in women's divisions in sporting events have an advantage over cis women. Her key argument was that the performance of trans athletes in the NCAA was decisive proof that they did not:

"The NCAA has allowed transgender people to compete without surgery since 2011, and there has not been a single dominant transgender athlete anywhere in college sports."

"These constitute large scale, longitudinal tests of the system with millions of athletes as a sample, and the IOC and NCAA rules for transgender athletes are clearly sufficient to preserve the integrity of sports at this time."

Tannehill suggests that if trans athletes had such an advantage, we would see them dominating college sports. However, the fact that there are relatively few trans athletes (a point she acknowledges later in the thread) means that NCAA women's sport results are strongly biased against producing trans champions simply because of the dramatic difference in sample size between trans and cis women athletes.

To illustrate why this matters, let's pick an objectively judged event as an example: the long jump in track and field. There were 898 NCAA women's outdoor track and field programs in 2017 (source) so let's estimate that each program fields 3 long-jumpers, for a total of 2,694 athletes. How many of those athletes are trans? The NCAA does not appear to publish this statistic, but estimates of percentages of the US population who are trans range from 0.3-0.6% (source). For the sake of the example, let's round up and assume that a full 1% of our hypothetical women long jumpers are trans (meaning the remaining 99% are cis).

If we want to use NCAA women athletes as a sample, it's clear we have a severe sampling bias when it comes to the trans/cis distinction. That can be a problem in a couple of ways. First, it means that in estimating, say, the average personal best long jump for each group (the individual PR), we will have a much better estimate of the cis average, and a much poorer estimate of the trans average. For our long jump example, we will have a sample of 2,667 cis women, and 27 trans women. Our estimate of the trans average will have very large error bars, and it would take a large difference between the groups to show up through random variation (i.e., if it would take a big advantage of trans athletes to be statistically significant - to know that the difference wasn't just noise).

However, Tannehill isn't suggesting we calculate average performance, but rather look at the champions in individual sports; in other words, she argues that if trans athletes have an advantage, they should be winning a noticeable number of national championships. In this case, we aren't just uncertain of the correct number for trans athletes, but rather the sampling bias makes it much more unlikely that a trans athlete to end up on the top of a podium.

To win a championship, one has to literally be the best. In other words, for our long jump example, the champion will basically have the maximum PR for their part of the sample. For trans athletes to end up winning championships, the maximum PR for the trans women has to be greater than the maximum PR for the cis women. But the maximum value for any group (in, say, a normal distribution) depends quite directly on the sample size. All other things being equal (like the mean and standard deviation), it's tremendously unlikely for the maximum long jump PR from a sample of 27 trans women to equal or surpass the same maximum from a sample of 2,667 cis women.

An easy way to visualize this is to imagine sampling from the (large) pool of cis women. What if we randomly selected 27 cis women who do the long jump from the total pool of cis women long jumpers. How often would one of those 27 jumpers beat the entire pool of 2,667 cis women? Only when the longest jumper happened to be selected by chance - 1% of the time. Suppose we extend this to a situation where the trans and cis women have exactly the same average jump and the same "spread" in the distribution of their personal PRs (in other words, the same standard deviation). The 27 trans women are just as unlikely to win a championship as the subsample of 27 cis women.

This bias in extreme results (the longest jump is by definition an extreme result) toward larger pools of athletes is just about sampling. A larger sample will always produce more extreme values (both maximum and minimum). If you want to dig into the statistics more, you could check out this quick summary. An interesting result from that analysis is that the difference between the average maximum from a sample of 27 and a sample in the thousands is about half a standard deviation. In other words, as a rough approximation, our sample of 27 trans women athletes would have to have an average long jump about half a standard deviation greater than their cis women colleagues to have an equal probability of winning the event (i.e., half the time the champion would be trans, half the time cis). Half a standard deviation is huge. In high school long jump, the average for women is 16.5 feet with a standard deviation of 2.2 feet (from an admittedly secondary source - if anybody has better data I'd love to see it). So the trans women athletes would have to jump roughly a foot farther on average just to have even odds of producing the national champion, much less "dominate" the competition.

Another example of this phenomenon is the documented increase in performance of national champions as the population of nations increases. Larger nations have more athletes - essentially a larger sample of the distribution of athletic performance - and tend to produce larger values for maximum performance. To extend the analogy to our example, one could think of the group of trans women competitors as a very small "country" that has difficulty beating the top athletes from a big "country" (the cis women) purely because of a small population.

In conclusion, evaluating the "dominance" of trans women athletes in NCAA competition is a really terrible way of assessing whether trans women athletes have an inherent advantage, due solely to the small number of trans women athletes. It would take a very large inherent advantage to show up as podium performances for trans women athletes. A better approach would be comparing average performance between cis and trans women athletes, though one might have difficulty detecting smaller inherent advantages above random statistical variation. Pooling all NCAA women athletes together might also obscure differences between disciplines that had different inherent advantages for either cis or trans women.

Finally, let me make one thing clear; I'm not making any conclusions about whether inherent advantages exist for trans women athletes. Quite the opposite, I'm pointing out how little we can know from the current history of trans women in NCAA competition given the small percentage of female athletes who are trans women. Obviously this is a question that matters to a great number of people, and we should be really careful not to draw unjustified conclusions from the data we have.

The Wanton Empiricist

Monday, March 11, 2019

Is the performance of trans athletes in the NCAA a good way to determine whether trans athletes have an advantage over cis women?

No comments:

Post a Comment