Wednesday, September 06, 2006

Negative Data?

Here's a truly special idea - via those crazy Thrashers, I ran into this intriguing article from The Economist, which points to the world of Negative Databases, and how they might be able to help the world of encryption and data security.

In a world where sensitive data gets frequently lost, Data security folk are always trying to come up with the most secure way to store data. And it doesn't take a six year old to tell you that the best way to keep your data safe, is to not have it there in the first place.
"Pshaw!", I hear you say, "You can't store it and not store it at the same time!", and in a way, you'd be right.

But then, in another, more accurate way, you'd be a bit wrong. Consider the following statement:
"All Ravens are Black"
From here, you could make all kinds of crazy assertions about all black things being ravens, but these are incorrect, despite being amusing. What's not incorrect, is that:
"All Non Black-Things are Not Ravens"
Which , it turns out upon some reflection, is true.

So, the concept of a negative database is concerned with storing the absence of the things you'd like to store. If your customer database has a 20 char field for customer name, you'd then store in that table, every single permeation of the alphabet of your choice, up to 20 characters, excepting the names of your clients. Let's call that table Non_Customers.

Let's also say you used the standard 26 letter English Alphabet - that's 560,127,029,342,507,827,200,000 possible combinations of letters that you can cram into that field, based on my amateur permutations math of n!/(n-r)!

Let's be really generous, and say that you have 500,000 client records. So we end up with a table containing 560,127,029,342,507,826,700,000 records, all of which are precisely NOT your customers names.

All of your SELECT statements are now a bit harder to write, but with a little work, you could theoretically piece together the precise data that was missing from the table. And if someone was to find the database table lying around on a laptop, they don't actually have the data. They have everything else!

These numbers are stupidly big. When you consider that a very large database is classed as one with several billions of rows, you can rest assured that the non_customers table isn't going to be working it's way into your stored procedures anytime soon. But, as big as they are, they aren't infinite. Which means that as processing power increases, maybe one day it will be possible to store your entire backup as a secure database shadow...


  1. If I stole your database of negative data, could I not just work out what is missing and therefore generate the real data?

    So, taking your example I could go through and find out what customer names were missing and then I'd have the actual list. A bit more work than having the list straight away, but still the data is all available isn't it?

  2. Yeah I agree, the whole negative data thing seems pointless, and unbelievably time consuming for both the client and the potential hackers.

  3. Yeah, you're right - it is unbelievaby time consuming. But right, there, you've hit it precisely on the head - remmeber that no encryption exists that can't be broke by someone, given enough time. Most security these days is based on the concept of making the maths so incredibly annoying, that people won't bother.

    So, maybe if you had a super computer at your disposal, making a negative imprint of your db for transporting data might not be such a bad idea. However, if the technology was readily and easily available, it would kind of defeat the purpose - it would just be a really slow way to store information...

    I just liked the idea of storing non-data... what sort of a kooky idea is that?