Simulation Engines
There are a number ways to develop a sports simulation engine; but all will use statistical information from real sporting events as the basis of probability for the events to be simulated. The depth (number) and breadth (relevance) of the statistics used contribute to the likelihood of a simulation producing realistic results. This is why, if you are familiar with the genre, you see a number of excellent baseball simulations. The law of large numbers dictates that the more times a trial occurs, the closer the results of those trials will resemble the mathematical probability of those events occurring. In baseball, for example, a player batting .300 with 500 AB, is likely to continue hitting about 30%, while a player batting .300 with 10 AB is not as likely to continue hitting so well: 3/10 could be a fluke, while 150/500 is likely not.

The limited number of games played in a season of high-school football naturally will make any attempted simulation less likely to represent true probability than an attempt with a sport like major league baseball. The variability of the competition at the high school level also serves to make simulation somewhat weak. That is, while I can be fairly certain the degree of talent and skill will not vary much from year to year in a professional league, I can not be so sure that it will not vary a great deal from year to year in a high school league like the MVL. This varying level of competition from year to year may make weaker teams appear statistically stronger than the team truly was on the field.

That being said, it is fun to try! So here's what I've done.

This Simulation Engine
I used PF/g & PA/g as the base data for each team. These numbers were intended to represent offensive and defensive strength respectively. I then found the difference between PF/g of each team and the PA/g of their opponents; and the difference between PA/g of each team and PF/g of their opponents. This step is intended to take into account the strength of a team's opponents with respect to offense and defense, respectively.

Offense:
Team      PF/G  -  OPP PA/GM
--------- ----     ---------
SH (2004) 50.5  -  20.64     = 29.86


Defense: Team PA/G - OPP PF/GM --------- ---- --------- SH (2004) 3.38 - 14.75 = (-11.38)
The number 29.86 tells me that in 2004 Sheridan scored 29.86 more points per game than their opponents gave up on average, and the number (-11.38), tells me that 2004 Sheridan gave up 11.38 fewer points than their opponents scored on average.

These numbers are then added to the average ppg of the league for the year in question and that number is divided by average ppg of the league. In this case the average ppg was 18.72. So (29.86+18.72)/18.72 = 2.59. This tells me that Sheridan, playing the average team of 2004, will score 2.59 times what the average team would score in the same year. The same is done with respect to defense (-11.38+18.72)/18.72=.039. This tells me that Sheridan, playing the average team of 2004, will allow only 4% of what the average team would allow. This is how I've adjusted for era, as a team scoring 50 points in an era where the average team was scoring 20, will likely be much weaker than a team scoring 50 points in an era where the average team was scoring 10.

These two numbers, 2.59 (off) & .039 (def) were then put into polynomial equations that represent a curve from the strongest team to the weakest team. The equations used are as follows:

Offense: -10.898 x^2 + 68.162 x - 5.2442
Defense: -0.8287 x^6 + 12.478 x^5 - 7.853 x^4 + 202.52 x^3 - 253.82 x^2 + 68.244 x + 94. 614.

Plugging the above numbers into the equations gives me an offensive and defensive power rating for each team, in this case, Offense = 98.26 Defense = 92.93.

These ratings are then used to simulate possessions, the idea being that Sheridan when given the ball will score about 98.26 percent of the time, when they are not stopped by the defense (that is, they lose possession by some fault of their own). They will stop an average offense 92.93 percent of the time. The randomized results of offensive possessions of one team vs. a randomized defensive stand of the other team results in a score.

A score is determined to be either 7 or 3 points depending on a linear formula of offense vs. defense.

The number of possessions simulated per game is randomized between 12-18 per game.

Problems with this simulation:
To begin with, this simulation struggles to allow excellent offenses to score against defenses that are comparatively weak. I am working on a better methodology for deriving offensive and defensive strength numbers to alleviate this problem in future versions of the simulation. The problem only comes to light when pitting the very best defenses against the very best offenses, but that is usually the goal of simulation. We will likely see the championship scores being lower than those in the regular pools because of this problem.

I would also like to make this simulation more probabilistically tight. You may have noticed that to derive the probabilities used in the simulation I am relying on polynomial equations, which are really ill suited to do the job they are doing here. I need to base the final probabilities (Off & Def probability rating) on something to do with game results as opposed to a mathematical curve. I'm still working on this.

I hope this has been somewhat informative. Keep in mind this is my first attempt at high school football simulation, and I have some ideas about how to make this simulation better. But for now, I hope you enjoy the results.