Methods and explanation: Evidence
It is well known that many human traits follow a Normal or Gaussian distribution. For example, the following figure:
presents the distribution of heights (in centimeters) of 4,635 American men. These men were part of a national sample obtained in 1971 and 1974. The figure presents both a histogram that describes the actual heights in the sample and a smoothed curve that approximates the shape of the histogram. The smooth curve is a Gaussian distribution.

It occurred to me that athletic performance might follow such a distribution. In general a Gaussian distribution is likely to hold when:

  • the response variable is the sum of a large number of smaller random variables
    • height of person is the sum of shin+thigh+trunk+... all of which are controlled by sets of genes
    • IQ of a person is the sum of intelligence in a large number of individual attributes
    • weight of passengers and luggage on a plane

  • the response variables is the mean of a large number of smaller random variables.
    • sales per customer of small food objects
Athletic performance does indeed stem from a large number of variables: lung power, leg power, attitude, coaching, etc. Once should expect a Gaussian distribution to hold for running speed, for example. In particular, we are interested in the distribution of "top marks", eg. season best times for running. If these follow a Gaussian distribution, the probability of finding increasingly better marks will fall. We identify probability with the inverse of population.

Consider a specific example. The best 100 m by a US high school runner last year was 10.14 s. This result has a speed of distance/time=100/10.14=9.86 m/s. The probability for such a mark would be 1/(8 million), since there were 8 million HS boys in the US in 2000.

It is important, however, to be clear as to the sample. A distibution of Sumu wrestlers would not have the same atheltic performance distribution as a team of gymnasts, for example. I therefore restricted the analysis to high school students in California and the US. The analyis requires a series of points of the best performance in a group of known size. The size, or populations, for the marks used are shown in the following table. For "HS varsity", I have taken 333, assuming three runners from a school of 1000. "League", is 8 schools of 1000 students each, so 8th in league is one in 1000, etc.

Estimated size of populations for each mark.

SamplePopulation
HS varsity333.
8th in League1,000.
8th in CCS13,000.
50th in CA State20,000.
3rd in CCS35,000.
10th in CA State100,000.
10th in US800,000.


Although records might seem a useful source of data, the sample sizes associated with a given record is hard to assess. Rather, I have used ranking lists, which give the best perfomances in a given year. In particular, I have used the year 2000 ranking lists for California[1] and US [2] high schools. These are suitable because the populations are readily obtainable from census data.[3] To these "hard" numbers, I have added data based on my own observations of high school performance in San Jose, for schools that compete in the Blossom Valley League, and the Central Coast Section of California Interscholastic Federation. The Table gives the actual populations used.

Good fits are obtained for all events using this approach. An example, for the high hurdles, is shown in the figure below.




References:

  1. "Califronia Track and Running News 2000 California State Best Marks List"
  2. "DyeStat 2000 Outdoor US Rankings"
  3. "US Census Bureau population Estimates"


==>> Back to application

==>>Back to long version