Simulating Election Results using Machine Learning & Data Science

Sumanta Adhikari

Published Jun 2, 2021

In my previous article “Application of Machine Learning: Replacing Exit Polls by Data Science” [ https://www.garudax.id/pulse/application-machine-learning-replacing-exit-polls-data-adhikari/ ] I have highlighted on how Data Science can actually replace or at least minimize the effect of exit polls. This is very useful in Indian context where Exit Polls involve hefty amount and media agencies goes out of their budget to spend on these polls. Most importantly, these results are hugely unreliable mainly due to miniscule sample size and location of samples. That is why, it has often been noticed that, in spite of spending humungous amount of TIME & money, the projection values are far off to reality. Even the projections from different agencies also do not match to each other.

In fact, in my last article, I have proposed an alternative to Exit Polls, projecting the polls based on historical data.

Thought of explaining in detail how one can project different scenarios and potential results of each of the scenarios using historical election data and different Machine Leaning & other simulation techniques.

For that purpose, let me pick up my state, state of West Bengal, India; and project it’s result for the next election, to be held in 2024. Just to set the background, West Bengal has 42 Parliamentary Constituencies (referred as PCs going forward) and 294 Assembly Constituencies (referred as ACs going forward). Each PC has 7 ACs mapped to it. Since the realignment of Constituencies, the following elections were held:

· 2009 – Parliamentary Election

· 2011 – Assembly Election

· 2014 – Parliamentary Election

· 2016 – Assembly Election

· 2019 – Parliamentary Election

· 2021 – Assembly Election

For each of these elections you can have following details if you can search Election Commission’s website properly.

· Total Electors (people who are registered as voters for the constituency)

· Total Voters (people who have actually voted in the election)

· Male Electors

· Female Electors

· Male Voters

· Female Voters

· Votes polled by each of the major parties \ groups

These details are available at the granularity of Booth level, or at a higher level of AC or PC level. To make an apple to apple comparison, it is recommended to consider everything at AC level, and roll up at PC level.

With this background I am trying to simulate the results of 2024 Parliamentary Election for 42 seats in West Bengal and corresponding leads of 294 AC level seats. For that the following approach was taken:

Step #1: Forecast Electors and Voters (Male, Female & Total) for 2024. This is very important and will be replaced by the actual values in 2024.

Step #2: Now, using historical values and projected values of Electors & Voters for 2024 we tried to predict the votes each of the parties or groups are expected to get. One thing to be noted here is, my assumption is the parties and groups will remain the same as they have contested in this election. But we all know that Politics makes strange bedfellows. If the alliances get reorganised the whole analysis need to be recalculated.

As of now, I have distributed the Votes into 4 buckets:

1. TMC (The Ruling Party)

2. BJP (Main Opposition)

3. The Front (Left parties along with Congress & other smaller parties who fought together in 2021)

4. Others (including NOTA – None of The Above option)

Step #3: Use as many algorithms as possible to predict the number of votes Each party will be getting in 2024 in each of the seats. I have used multiple Regression type algorithms & Forecast type algorithms. These algorithms were used at both AC & PC levels. The AC level results were then rolled up to PC level.

Step #4: With each of the algorithms giving a different set of values constituency wise & party wise, we can simulate different combinations of them and apportion it with the Total projected voters to create 1 set of projected results.

Mathematically, if we have 10 different set of results coming out of 10 different algorithms; for 4 Parties mentioned above; basically, we can generate 104 (10000) set of results. Now, we need to finalise which one of them or weighted average of some of them are likely possible scenarios.

Step #5: Among the scenarios we’ll try to simulate 4 scenarios that are highly likely. They are:

1. Most Likely Scenario (A scenario which is some kind of central value of all the 10000 set of results.)

2. Best Case Scenario for TMC (Out of these 10000 possible scenarios find the best possible logical scenario for TMC)

3. Best Case Scenario for BJP (Out of these 10000 possible scenarios find the best possible logical scenario for BJP)

4. Best Case Scenario for Front (Out of these 10000 possible scenarios find the best possible logical scenario for Front)

While the first scenario gives idea about the central tendency of the future election, the next three scenarios gives us the idea about the extremities.

Step #6: Almost all the models developed has one thing in common. The features we are using as Projected values (like Total Electors or Total Voters for 2024) are part of the model. When the actual values of Electors will be published by Election Commission in 2024, we need to update the values of Electors accordingly. Also, we need to track the Percentage of votes polled published by Election Commission on the polling day and replace the Projected Voters by the actual ones. These changes will finetune the models and make the numbers closer to the actual ones.

Step #7: The Final projection will be published after making all those changes on the last date of Election after the voting ends.

Let us focus on the inferences that we can derive for each of the scenarios. We can derive:

· The projected winner for each PC

· Total Votes polled by each of the 4 buckets defined above at PC as well as AC level

· Percentage of votes polled by each of the buckets.

· Leads at AC level

· As many of the seats are projected by minimal gap between the first & the second parties, it is highly likely that the results for these seats can swap(error level). So, apart from giving a point estimate of projected number of seats to be won by each party, we are also creating a range of seats around that.

While Step #6 & Step #7 can be executed only in 2024, we can still create initial scenarios. Here goes my analysis results:

Scenario 1: Most Likely Scenario (A scenario which is central value of all the set of results.)

· PC level results:

o TMC – 22

o BJP – 20

· Voting Percentages:

o TMC – 46.4%

o BJP – 44%

o Front – 7.5%

o Others – 2.1%

· AC level Leads:

o TMC – 153

o BJP – 141

o Front – 0

· Projected Range of seats at 2% error level

o PC Level

§ TMC – 21-26

§ BJP – 16 - 21

o AC Level

§ TMC – 136 - 171

§ BJP – 123 - 158

§ Front – 0

Scenario 2: Best Case Scenario for TMC (Out of these possible scenarios find the best possible logical scenario for TMC)

· PC level results:

o TMC – 33

o BJP – 9

· Voting Percentages:

o TMC – 49.1%

o BJP – 38.3%

o Front – 9.3%

o Others – 3.3%

· AC level Leads:

o TMC – 224

o BJP – 70

o Front – 0

· Projected Range of seats at 2% error level

o PC Level

§ TMC – 32 – 36

§ BJP – 6 – 10

o AC Level

§ TMC – 214 – 239

§ BJP – 55 – 80

§ Front – 0

Scenario 3: Best Case Scenario for BJP (Out of these possible scenarios find the best possible logical scenario for BJP)

· PC level results:

o TMC – 6

o BJP – 36

· Voting Percentages:

o TMC – 37.6%

o BJP – 51.0%

o Front – 8.4%

o Others – 2.9%

· AC level Leads:

o TMC – 49

o BJP – 245

o Front – 0

· Projected Range of seats at 2% error level

o PC Level

§ TMC – 4 – 7

§ BJP – 35 – 38

o AC Level

§ TMC – 44 – 54

§ BJP – 240 - 250

§ Front – 0

Scenario 4: Best Case Scenario for The Front (Out of these possible scenarios find the best possible logical scenario for The Front)

· PC level results:

o TMC – 25

o BJP – 17

· Voting Percentages:

o TMC – 42.3%

o BJP – 39.3%

o Front – 15.0%

o Others – 3.4%

· AC level Leads:

o TMC – 163

o BJP – 128

o Front – 3

· Projected Range of seats at 2% error level

o PC Level

§ TMC – 22 – 29

§ BJP – 13 – 20

o AC Level

§ TMC – 144 – 184

§ BJP – 107 – 146

§ Front – 3 – 4

Significance of the scenarios: Creation of scenarios is most probably the uniqueness of this approach. This can help:

1. Political Parties – They can pre-empt their best and worst possible scenarios way before the actual election and can strategize accordingly.

2. Political Analysts – They can play around different possible outcomes and simulate the scenario that they believe the most likely one and analyse in depth what could be the projected results.

3. Psephologists – Based on the latest available trends during the election, they can super-impose their data collected from different sample surveys with the possible results generated by the tool.

4. Common Men – There is a common belief that all the Opinion Polls \ Exit Polls conducted by Media Houses are highly funded and influenced by political parties. So, people like us can simulate a logical result on their own and need not rely on the Media House’s projections.

To view or add a comment, sign in

Simulating Election Results using Machine Learning & Data Science

Sumanta Adhikari

More articles by Sumanta Adhikari

Others also viewed

Data Isn't A Strategy

Is Data science still struggling to predict future?

My love & hate affair with data

What Facebook Data Did They Get and What Did They Do?

Trust in numbers? Why we need strong official statistics in an attention seeking society

Three reasons why data is not the new gold

Mr. Trump: let's Make Data Great Again!

Saturday Night Data Party, 8 Feb 2020

Trump, Failure of Prediction, and Lessons for Data Scientists

Some fundamental problems with Analytics

Explore content categories

More articles by Sumanta Adhikari

Application of Machine Learning: Replacing Exit Polls by Data Science

The Forgotten Inward Eye: Analytics Focusing the Employees

The Missing Link: The Art of Data Science

Others also viewed

Data Isn't A Strategy

Is Data science still struggling to predict future?

My love & hate affair with data

What Facebook Data Did They Get and What Did They Do?

Trust in numbers? Why we need strong official statistics in an attention seeking society

Three reasons why data is not the new gold

Mr. Trump: let's Make Data Great Again!

Saturday Night Data Party, 8 Feb 2020

Trump, Failure of Prediction, and Lessons for Data Scientists

Some fundamental problems with Analytics

Explore content categories