So I’m sure you’ve read a thing or two about all those crazy electronic voting machines being inaccurate. One thing I find slightly perplexing is why the misrepresented votes seem to always be in favor of the Republican party. I don’t get it. It’s not like the voting machine companies would be so blatant or stupid to try to rig an election so outright. Especially in a world that is already so suspicious of electronic machines. But if it were purely a bug, wouldn’t it be equally likely that Democrats benefit? Of course this could always be explained by the fact that perhaps there is a procedure in place for inputting candidates and Republicans and Democrats get placed into the system in a specific order (such as Democrats being added in first). Who knows.
So with the constant attention those digital voting machines get, a lot of people ask, “WHAT is so difficult about writing software that tallies votes?” Now I’m not one to study the Diebold machines, but I thought it would be interesting to pick at the problem.
First of all, the votes must be logged. But not just any log. It must be secure and immune from tampering. And when I say “tamper,” I am talking about from everybody. That includes the developers, the database administrators, the voters, and the polling staff. I can only begin to imagine that they use a bunch of one way hardware encryption and md5 checksums.
The votes would need to be isolated from each other from the data integrity perspective: if vote #35252 breaks the system, all prior votes (#1 through #35251) must remain unscathed. Although most modern databases use transactions to ensure data integrity, I would imagine there is no fool proof means without creating a replica of the vote on a second or third physical location.
Of course, such data replication causes problems in the event data is inconsistent. What happens if the primary fails and the vote was only recorded on one of the two slaves. Do you count that half vote? What if a replication error had occurred where one slave copied something differently from the primary? Which is right? These things happen (database corruption) and they usually tend to clump up together to result in catastrophic failures.
Purposeful Fraud Issues
Let’s attack this from another angle. The main culprit to election day problems will probably be human “error.” An electronic machine must protect against this. Unlike a punch card that the actual human physically pokes, a digital machine does the card punching for you (on its hard drive), which is almost like telling someone to punch in your vote as you specify.
There’s been instances of a programmer placing bugs in slot machines that gave them jackpots if they bet in a certain order. There have been cases of system administrators leaving back doors into the servers. There’s a huge list of historical events that show that no system, no matter how hard a company tries, is secure from malicious employees. But that is exactly what this system must be designed to fight. How would you ensure it is safe? Peer code reviews? Multi-part passwords that require three separate people with three separate passwords to authenticate? Physical keys, like the one you see in movies, where both people have to have different keys turned at the same time to open a machine? Okay, so let’s say you somehow secure your employees. The problem doesn’t stop there.
I’m setting up the machines. “Let’s see,” I say to myself with a grin, “Kerry is going to be candidate 1, and Bush will be candidate 2… for now. At the end of the night, I go back and say, “Oops, I meant 1 equates to Bush and 2 equates to Kerry!” With any regular database, this is entirely possible, and everybody’s votes just got reversed. Of course a smart voting machine would never let you change around the names for a created record. But then again, hackers don’t need to worry about that.
So the voting machine company decides that you “can’t” change the name of a candidate after it’s been put into the system. What happens if I were to put in a second “Bush” to dilute his votes between his mystical twin? Or what happens if I create a new candidate half way through the election under his name? Well, in some instances, the software might just show him twice (this is good) or in others, it would show him once (this is very bad). In crappier software, that of course means voters would be voting for one OR the other “Bush,” but nobody would know exactly which.
Of course the voting company would protect us from ourselves by ensuring candidates can’t be added in after the machine is shipped out. But therein lies another problem.
Let’s say you’re running the voting company that is running an election across a few dozen districts. Of course, all the votes must be tallied. A “Bush” vote in one county must group up with a “Bush” vote in another. But how? The human answer is to use the name, but realistically, we know that another “Bush” might be running under a different position in some counties. You can’t just use the name as the qualifier because it is not unique. So you would use IDs, I presume.
But of course this means every machine must use an ID that is not internal to it. You would say, “All 1′s are Kerry’s and all 2′s are for Bush!” Now that this is decided, you would have shipped out all of the machines to only accept votes for Kerry = 1 and Bush = 2. And when the machine gets back, you would save it into the main system as 1 = Kerry and 2 = Bush.
But where’s the sanity check? Who knows what happened while that box was out there in the wild. How do you know that 1 is indeed still representing Kerry for that box? How do you know that everybody that voted “Kerry” on that box got saved in as a “1″? This is even more of a problem if you do the counting right in the same place that the voting is taking place.
And even if you did use names, despite it being a horrible idea, how do you know that a “Kerry” vote got saved as “Kerry?” For all you know, there is a bug, and all Kerry votes are getting saved as “Bush” and all Bush votes are getting saved as “Nadar” because someone forgot that array indices start at 0, not 1 (theoretical technical explanation for how these bugs could arise).
So of course, that means you would write a binary log of all activities that box experienced. But what is this log for? Auditing? Shouldn’t auditing be happening at every step of the way regardless? If anything, problems are much harder to catch in the digital version of voting so this audit trail would rarely if ever be used except in the most extreme cases. Okay, so I’ve convinced you that it should be used all the time, right? Okay, but then what?
Is it being replicated? Is it safe from incomplete transactions? Will a corrupted insert break the entire file? What happens if the power cuts out right as it is writing a record? Is the whole file toast? Suddenly you realize the log file must also use a database to ensure its integrity. Possibly on a separate process to ensure it is isolated from the main vote records.
But what the hell is the point of all this? If there is going to be a discrepancy, shouldn’t it have been caught during testing? Why go through all this trouble double logging and replicating all of this data?
The last point is the most important. You’ll notice that through simple logic, we suddenly had to have tons of auditing overhead to do something so simple. And despite your best testing efforts, things that should be absolutely positively without error are still being audited to ensure their integrity. So what happens when you overlook one of these “no-brainer” assumptions?
You get voter fraud.
This only covers some theoretical problems that I might face when trying to put together a voting machine. I would assume a well-funded corporation would generate a list or problems 10x this length. While tallying votes may be simple in concept, if your application must be 200% bug free and hacker proof, developing the application becomes immensely difficult.
This still doesn’t explain my original thought about the Republican vote bias though.