Normalization

 

Pat Helland has been talking about immutable data for a while. He last post on ‘Normalization is for sissies’ is quite fun. A not-very-accurate post from Dare made me remember about it and pushed me to post this.

Pat is playing with two ideas.

One is that immutable data should not be normalized as normalization is designed to help you dealing with updates.

Another is that you actually don’t need to delete/update the database. ‘Deleting’ a row means setting a InvalidationTimestamp = now(), and updating a row means setting InvalidationTimestamp = now() and inserting a new row with SinceTimestamp = now() and InvalidationTimestamp = null (you actually need two sets of dates, but that’s for another post).

Now, if you put the two ideas together, all the data is immutable, so you don’t need to normalize anything. This means you will have a record that have all the ‘extended table’: the ‘base table’ and all the fields from related tables in your normalized model. If you have Orders, Customers, Countries, your tables will look like

Order: OrderId, OrderDate, CustomerId, CustomerName, CountryId, CountryName, SinceTimeStamp, InvalidationTimeStamp

Customer: CustomerId, CustomerName, CountryId, CountryName, SinceTimeStamp, InvalidationTimeStamp

Country: CountryId, CountryName, SinceTimeStamp, InvalidationTimeStamp

You will be wasting a lot of disk space, but that’s not something to worry about. The advantages of this approach are very important. You don’t need to join, and you can cache/replicate most of your data.

The main physical issue I find today with this approach is that database engines have a limit in the number of columns they can store, and an approach like this one will require a large number of columns per table.

I wonder how a model like this will impact O/R mapping tools.

They can probably hide this kind of schema automatically by changing the semantics of delete/update, writing Order.Customer.Name should return the name in the Order row.

How would they handle object identity? Now if I have ‘Customer #1’ in memory, every reference to Customer #1 points to the same instance. This is because the object model is normalized. Now they should point to different read/only instances.

Normalization

Advertisements
Explore posts in the same categories: Uncategorized

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: