Methods and explanation: Evidence|
It is well known that many human traits follow a Normal or Gaussian distribution. For example, the following figure:
presents the distribution of heights (in centimeters) of 4,635 American men. These men were part of a national sample obtained in 1971 and 1974. The figure presents both a histogram that describes the actual heights in the sample and a smoothed curve that approximates the shape of the histogram. The smooth curve is a Gaussian distribution.
It occurred to me that athletic performance might follow such a distribution. In general a Gaussian distribution is likely to hold when:
Athletic performance does indeed stem from a large number of variables: lung power, leg power, attitude, coaching, etc. Once should expect a Gaussian distribution to hold for running speed, for example. In particular, we are interested in the distribution of "top marks", eg. season best times for running. If these follow a Gaussian distribution, the probability of finding increasingly better marks will fall. We identify probability with the inverse of population.
- the response variable is the sum of a large number of smaller random variables
- height of person is the sum of shin+thigh+trunk+... all of which are controlled by sets of genes
- IQ of a person is the sum of intelligence in a large number of individual
- weight of passengers and luggage on a plane
- the response variables is the mean of a large number of smaller random variables.
- sales per customer of small food objects
Consider a specific example. The best 100 m by a US high school runner last year was 10.14 s. This result has a speed of distance/time=100/10.14=9.86 m/s. The probability for such a mark would be 1/(8 million), since there were 8 million HS boys in the US in 2000.
It is important, however, to be clear as to the sample. A distibution of Sumu wrestlers would not have the same atheltic performance distribution as a team of gymnasts, for example.
I therefore restricted the analysis to high school students in California and the US. The analyis requires a series of points of the best performance in a group of known size. The size, or populations, for the marks used are shown in the following table. For "HS varsity", I have taken 333, assuming three runners from a school of 1000. "League", is 8 schools of 1000 students each, so 8th in league is one in 1000, etc.
Estimated size of populations for each mark.
|8th in League||1,000.|
|8th in CCS||13,000.|
|50th in CA State||20,000.|
|3rd in CCS||35,000.|
|10th in CA State||100,000.|
|10th in US||800,000.|
Although records might seem a useful source of data, the sample sizes associated with a given record is hard to assess. Rather, I have used ranking lists, which give the best perfomances in a given year. In particular, I have used the year 2000 ranking lists for California and US
 high schools. These are suitable because the populations are readily obtainable from census data. To these "hard" numbers, I have added data based on my own observations of high school performance in San Jose, for schools that compete in the Blossom Valley League, and the Central Coast Section of California Interscholastic Federation. The Table gives the actual populations used.
Good fits are obtained for all events using this approach. An example, for the high hurdles, is shown in the figure below.
"Califronia Track and Running News 2000 California State Best Marks List"
"DyeStat 2000 Outdoor US Rankings"
- "US Census Bureau population Estimates"
==>> Back to application
==>>Back to long version