Predicting Ofsted grades

This app estimates the probability of a secondary school being awarded a certain grade after an inspection by Ofsted. To use it, simply select the characteristics of the school you want to predict for and then choose a prediction algorithm. The estimated probabilites are shown in the chart below. See the methods page for limitations.

Features description

There are six predictor variables and one outcome variable in the dataset used for this app. More information on the classification algorithms can be found in the Methods tab.

  • Region: this is the region of England where the school is located.

  • School type: this is the type of school according to its governance structure.

  • KS2 APS (key stage 2 average point score): this is the average point score achieved by the school's year 11 cohort (16 year olds) when they were aged 11. It has been found to be a good predictor of the test results 16 year olds achieve in their key stage 4 tests.

  • Religious denomination: this is whether the school has a religious affiliation.

  • Gender composition: this is whether the school is single-sex or mixed in gender composition.

  • Total pupils: this is the number of pupils enrolled at the school.

  • Ofsted grade: this is the overall grade given to schools by the Office for Standards in Education, Children's Services and Skills (Ofsted) following a school inspection. Four grades are possible: Outstanding, Good, Requires Improvement and Inadequate. This is the outcome variable.



Two datasets were used in this shiny app. Both come from the UK Department of Education's statistics page.

Machine learning algorithms

The caret package was used to fit the machine learning algorithms.

The school dataset was split into training (70%) and testing (30%) datasets. All models were fit using the same training parameters and the same random seeds. Defaults were used except that models were tuned using 5-fold (rather than 10-fold) cross-valiation without repeat. Parallel computation was employed using the parallel and doParallel packages.

The specific algorithms:

  • Linear Discriminant Analsysis was fit using the lda method
  • Random Forest was fit using the rf method
  • Naive Bayes was fit using the naiveBayes method
  • K-Nearest Neighbours was fit using the knn method

For more information on these methods, see the caret package's GitHub page.

The best performer was the lda model (although its predictions based on increased KS2 APS are very odd). However, none of the algorithms employed achieved an out-of-sample error rate of under 40%. This indicates that other variables are needed to make better predictions and that further tuning and refining of the models is needed. However, the variables and models included here do a decent job of explaining a reasonable amount of the variation in gradings. Three other algorithms were also implemented but not used in the app because of lower accuracy rates than the four ultimately chosen. The choice of algorithms offered here is one of many examples of how statistical analyses and data aren't as “objective” as is often claimed.


You can find out more about building applications with Shiny here:

I referred to miningthedetails' shiny app and hack-r's server.R code code to help me build my app.

Finally, this app was partly inspired by that created by Trevor Burton. My app is an extension of Burton's as it uses a more recent dataset, allows the user to choose a classification algorithm, and employs more variables in the prediction. To reiterate Burton:

“I've only included the things you don't have any control over, so you can see how wrong it is that these should be correlated with Ofsted judgements.”