Rambling: Persistent Data Stores

There are many various data storage devices (talking software here) that accommodate various problems. All of these devices are very good at the specific (or few) use cases that they solve, but all have drawbacks.


SQL Databases - Structure data that can be queried. Also includes rigid data integrity and relationship constraints were added (foreign keys, etc). Added views to speed up and avoid joins. The main drawbacks, unless you buy an Oracle and an Engineer to go with it, optimization and clustering / fault tolerance can be trying. There are also features in most DBs which really don’t get used, which add a bit of overhead.

(Less) Structured Datastores - Really these are slightly fancy key-value stores. Some offer certain features over others, but they are generally: take a key, look up a value. Which gives them a leg up on insertion speed (only one index to compute). The downsides are the lack of the richness of SQL. The other con would be most are only a few years old and thus haven’t been placed in a vat of concrete like relational databases have…

I’m wondering, shouldn’t it be fairly easy to allow these two methodologies to work together side by side? I’m not talking about installing memcache (yes, I know this isn’t persistent) over here, hibernate with mysql over there, and cassandra over there… I’m talking about a unified server / interface / gui and protocol to access each.

With that being said, the actual functionality and features to access and create and handle each of these stores can be different. I’m not looking for a JDO solution. Frankly, it seems like the more you try to make something so abstract that it can apply to every little thing, the more confusing and prone to mis-interpretation it becomes. Obviously, or maybe not so, something as generic as @Entity could still be used, but it might be better to further define and directly explain the actual intent of the @Entity, ie @RelationalEntity or @DocumentEntity or @KeyValueEntity or @ViewCacheEntity.

What I would like to see would be a single config file, a single download, a single framework to handle these things. I don’t want to go over here and download memcache and over here to get mysql and over here to get some key-value store… One download, one simple backup, one place where the data lives on the disk, one thing to monitor.

Something better, and would eliminate the above para, would be to have this as a service (ala EC2). In such a case, true database engineers could finetune, maintain, administrate these things for the rest of us. And we could access these stores via SQL, REST, JSON, XML, SOAP, etc…

Further more, one of the big tricks to data management is the relationships. Cascading, caching, relationship integrity, etc. Can we create a unified mapping infrastructure that would allow us to define all of these for us?

Relationships:
- dependent parent-child
- one to one
- one to many
- many to one
- many to many
- cascade (save, update, delete)

but also, update views or documents or key values based on data from a completely different part of the datastore.

- update views with this value
- delete views that relied on this relationship
- if something is inserted in the key-value map, also insert it into the object graph and also update x documents.

What this means, is that one datapoint is defined, but can be used across many objects, views, documents, caches, etc; and the reference doesn’t have to point back to that specific datapoint (memory / storage is cheap), but if that datapoint changed somewhere, rules would be checked to ensure that where ever that datapoint is changed, it is updated or the appropriate action is taken.

It would certainly be an undertaking, but a fun one. A datastore to end all datastores.

Just thinking out loud here… If someone runs across this and knows of such a datastore, let me know. In the mean time, I guess I’ll keep hacking hibernate and mysql to fill the needs of what other stores provide.


Published by using 683 words.