How Accurate are Teacher Predictions of IB Scores?

Recently, an IB Coordinator wrote us the following:

"Your reports contain information about the accuracy of a teacher's predictions. A teacher has asked me how he can know how his accuracy compares with other teachers. Do you know if stats on the worldwide accuracy levels of teachers are available?"

We've received similar requests in the past. For example, a different coordinator asked:

"When we look at predictions, we honestly do not know what we should be expecting to see. Do you have a sense of what seems a reasonable expectation from teachers regarding accuracy?"

As far as we know, the IBO does not publish worldwide statistics related to the accuracy of teacher predictions.

At IB Score Reports, we see results from many different schools, so we decided to construct and share a descriptive analysis based on some of the data we have, in an attempt to help teachers and coordinators.

Our Analyses

We encourage you to read this entire article before viewing the following tables, but we're including links here, near the top, for convenience.

Predicted vs. Awarded - Subject Scores – 2013-2017 - 70 Schools

Predicted vs. Awarded - TOK and EE Scores – 2013-2017 - 70 Schools

Predicted vs. Awarded - Diploma Scores – 2013-2017 - 70 Schools

Method

We created a large sample by pooling the results of 70 different schools over the last five years, 2013 through 2017. Results were gathered from each school's CSV files.

Letter grades were converted to numbers in order to perform calculations. A = 5, B = 4, etc. If a teacher predicted a "C" on an extended essay and the student earned an "A", that would be counted as an underprediction by two points.

Characteristics of the Sample

The 70 schools represent a wide range of programs, from well established to relatively new, from large enrollment to small. The sample includes schools from Asia, Europe, Africa, the Middle East, U.S., Canada, and elsewhere. All of the schools participated in May exams, and all had results in each of the last five years.

In our combined sample of 70 schools over five years, we have...

143,874 scores from 207 different subjects
20,788 scores from 79 different EE subjects
20,810 TOK scores
20,548 diploma scores

Results

We want to stress that the primary purpose of this analysis is to serve as a resource for teachers to compare their prediction accuracy to a large norm.

Still, one cannot help but begin to look at patterns and trends. Here are a few observations that we find interesting.

Collectively, the distributions of predication accuracy look like this:

KEY FINDING

In aggregate, teachers are most accurate when predicting subject scores for courses, less so when predicting TOK scores, and least so when predicting EE scores.

Decreasing accuracy manifests itself in several ways:

The percentage of perfectly accurate predictions decreases: 50% for subject scores, 44% for TOK, 38% for EE’s.

The average absolute difference between predicted and awarded scores increases: 0.5 for subjects, 0.6 for TOK, 0.8 for EE’s. (The latter means, for instance, that the average difference between predicted and awarded scores for EE’s was nearly one full letter grade.)

The percentage of overpredictions increases: 27% for subjects, 35% for TOK, 44% for EE’s.

That teachers are more accurate predicting course subjects than EE subjects is somewhat surprising.

Predictions for course subjects are tricky. Accuracy depends on knowing how students are capable of performing, as well as their motivation to perform their best. For some students, IB scores will not change their university admissions, so motivation may not be as high as we would like when they sit their exams.

But predictions for Extended Essays are entirely different. There, teachers see the completed work and so are not predicting future performance. Teachers are not really making predictions at all. They're simply scoring the essays based on the rubric. At most, their "predictions" are about how the IB will score the essays using the same rubric.

Perhaps teachers are simply more familiar with the material, expectations, and assessments of course subjects than with the topics, expectations, and rubrics for Extended Essays.

Year-to-year Variations

At the aggregate level, most year-to-year variations are minor.

One exception (highlighted in yellow below) is that TOK had a spike of overprediction in 2016. That coincided with a marked increase in moderation of the TOK Presentation, something we analyzed and discussed previously HERE.

Individual Subjects

Schools do not all offer the same subjects, and enrollments between subjects vary substantially. Generally speaking, we included subjects only when we had results from at least 100 students and 10 schools in each year.

Many subjects in our sample vastly exceeded this cut-off, while others just met it. In our sample, MATHEMATICS SL has results from more than 2,000 students and all 70 schools each year, whereas GERMAN B SL averages about 100 students and 17 schools each year.

Our filter left us with 40 course subjects, 10 extended essay subjects, plus Theory of Knowledge, and diploma scores.

You can see each of the included subjects here:

Predicted vs. Awarded - Subject Scores – 2013-2017 - 70 Schools

Predicted vs. Awarded - TOK and EE Scores – 2013-2017 - 70 Schools

There are some year-to-year variations within individual subjects that might be interesting and meaningful to those who teach those subjects.

Between-Subject Comparisons

There are considerable variations between subjects. For instance…

What’s interesting here is that for the first three subjects shown, scores tended to be UNDERpredicted.

In contrast, Computer Science HL, shown on the last line of the table, had the lowest level of accuracy -- both in terms of average absolute difference between predicted and awarded, and in terms of percentage of perfectly accurate predictions. It also had one of the largest amounts of overprediction.

It should be noted, however, that of the 43 course subjects meeting our filter for inclusion, Computer Science HL was on the lower end in terms of the number of students and the number of schools comprising its sample. (The number of students and schools in each subject's sample is included in the pdf tables linked above.)

We also need to remember that when looking at different subjects, we’re not only looking at different sets of students (because students don't all take the same subjects), but also different sets of schools (because not all schools offer the same subjects).

Again, you can see each of the included subjects here:

Predicted vs. Awarded - Subject Scores – 2013-2017 - 70 Schools

Extended Essay Scores

There was considerably less variation in prediction accuracy between the extended essay subjects. Most interestingly...

All of the 10 extended essay subjects in our analysis had considerably more overprediction than underprediction.

The largest skewing occurred with Economics EE, where 53% of scores were overpredicted and just 12% were underpredicted.

The least skewing occurred with English A EE and Visual Arts EE, where 36% of scores were overpredicted and 23 to 24% were underpredicted.

Again, you can see each of the included EE subjects here:

Predicted vs. Awarded - TOK and EE Scores – 2013-2017 - 70 Schools

Diploma Scores

In this analysis and in our reports for schools we calculate predicted diploma points ourselves, by summing the individual subject predictions and adding the relevant bonus points based on the EE and TOK predictions.

Some interesting trends:

In each of the last five years…

More than 50% of diploma scores were overpredicted.

Around 30% were underpredicted.

Also, in each of the last five years, about…

15% of predicted diploma scores were perfectly accurate.

43% of predicted diploma scores were within 1 point of the awarded total.

66% of predicted diploma scores were within 2 points of the awarded total.

81% of predicted diploma scores were within 3 point of the awarded total.

91% of predicted diploma scores were within 4 point of the awarded total.

96% of predicted diploma scores were within 5 point of the awarded total.

You can see the full table of predicted vs. awarded diploma points here:

Predicted vs. Awarded - Diploma Scores – 2013-2017 - 70 Schools

Your Own Predicted vs. Awarded Scores

These findings are intriguing, but the most important insights about predicted vs. awarded scores will come from your own school’s data, generated by your own program, teachers, and students.

If you haven’t already, we encourage you to review trends in predicted vs. awarded scores within each of your courses, EE subjects, TOK, and total diploma points. We can help with that, of course, since our report package includes displays of predicted vs. awarded scores for every subject at your school, including TOK, Extended Essays, as well as total diploma points, like these:

Limitations

Although our sample of 70 schools is quite diverse in terms of location, enrollment, and length of offering the IB Diploma, we do not claim that it is statistically representative of all IB schools, all IB schools in any particular region, or all IB Score Reports client schools. One should be cautious about generalizing our findings. Our goal, as always, is to help teachers and schools, using the best data we have available.

Would You Like To Have IB Score Reports For Your School?

Getting started is easy. Just click here to send us an email: support@acadamigo.com