The Coming Database Revolution

THE DATABASE
REVOLUTION

Robin Bloor, Ph D

Tuesday, August 2, 11

This Presentation

Intro: The RDBMS
Computer Hardware Trends
The NoSQL trend (Either No as
in none or NO as in Not Only)
What to do...

Main Take Away:

Database is no longer a commodity


A Point Of Departure
In the 1990s, Relational Database
quickly became the dominant form
of database.
The SQL language became the
dominant data access mechanism.
The RDBMS conferred mathematical
respectability on itself and even
claimed an underlying “Relational
Algebra.”
The RDBMS dominated because it
dealt effectively with transactional
and BI apps.


Relational Dogma
Data and Process should be kept
separate.
The database embodies a data
model within a schema
Normalization to 3NF (or 5NF) is
the correct way to design the
schema
The query language (SQL) is part
DDL and part DML (Select,
Project, Join)
Ordering doesn’t matter


The 1990s RDBMS
The RDBMS of the 1990s was
physically based on B-tree
structures and an optimizer.
This scaled up within reason but
it scaled out poorly.
It was fundamentally an index-
based data store.
It managed megabytes and
gigabytes ﬁne.
But look what happened to
data....


Moore’s Law Cubed
Moore’s Law suggests that CPU power increases
10-fold every 6 years (and other technologies have
stayed in step to some degree)
Large database volumes have grown 1000-fold:
In ~1992 measured in megabytes
In ~1998 measured in gigabytes
In ~2004 measured in terabytes
in ~2010 measured in petabytes
Exabytes by ~2016?


HARDWARE


RDBMS


A Database is a Cupboard

Some are transactional (for
operational systems)

Some service large queries
against large data heaps

Some are content oriented for
accessing complex objects
(object based systems mainly)

All databases need to deliver
performance



RDBMS ✔



performance



RDBMS ✔

RDBMS ??


performance



RDBMS ✔

RDBMS ??

RDBMS ??

performance


Hardware Data Points
Moore’s Law now proceeds by adding
cores rather than by increasing clock
speed. Vector registers now standard on
Intel chips
Parallelism is now on the rise and will
eventually become the normal mode of
processing
Memory is about 1 million times faster
than disk and random reads have become
very expensive in respect of latency
The Intel processor is now being
challenged by the ARM processor (it’s
about heat)


Memory v Disk


Memory v Disk
The decline in memory
costs is (on current
trends) likely to have
memory cheaper than
disk around 2016
This means that non-
volatile SSDs will
prevail relatively soon.
SSDs are between
1000 and 100,000
times faster than
spinning disk


Massive Scale-Out
CPUS are now
doubling cores every
18 months or so.
This trend, combined
with memory cost
trends, suggests that
massive scale out will
eventually become a
much rarer
requirement.
But we cannot know
that for sure.


Consequences
SSD will replace disk - but slowly...
Many DBMS tasks can now be
handled in memory - but better
physical architectures are possible
for this.
Physical indexes are becoming
irrelevant
Scale out and parallelism are now
the driving force for large data
volume applications.
The physical architecture of the
traditional RDBMS is now an
anachronism


NoSQL


A Plethora of Databases
4th Dimension, Adabas D, AllegroGraph, Alpha Five, Altibase, Apache Derby, Aster
Data, Azure Table Storage, BaseX, Berkeley DB, Bigdata, BlackRay, CA-Datacom,
Cassandra, Chordless, Citrusleaf , Clarion, Cloudata, Cloudera, Clustrix, CouchDB,
Network OLAP OR
CSQL, CUBRID, Daffodil database, Data Management Center (DMC), Database
DBMS
RDBMS
DBMS
ODBMS
DBMS
Management Library, DataEase, Dataphor, DB-Fast, db4o, Derby aka Java DB, DEX,
Dynomite, EfﬁProz, ElevateDB, Empress Embedded Database, EnterpriseDB, eXist,
eXtremeDB, Faircom C-Tree, fastDB, FileDB, FileMaker Pro, Firebird, FlockDB,
FrontBase, GenieDB, GigaSpaces, Gladius DB, Greenplum, GroveSite, GT.M, H2,
Hadoop / HBase, HamsterDB, Hazelcast, Helix database, Hibari, HPCC, HSQLDB,
Open In
HyperGraphDB, Hypertable, IBM DB2, IBM DB2 Express-C, IBM Lotus Approach, IBM
Text Content XML
Source Memory
DBMS DBMS DBMS
Lotus/Domino, Inﬁnite Graph, Infobright, InfoGrid, Informix, Ingres, InterBase,
DBMS DBMS
Intersystems Cache, InterSystems Caché, ISIS Family, KAI, Kognitio, LightCloud, Linter,
Magma, MariaDB, Mark Logic Server, MaxDB, Mckoi SQL Database, MEMBASE,
MemcacheDB, Microsoft Access, Microsoft Jet Database Engine (part of Microsoft
Access), Microsoft SQL Server, Microsoft SQL Server Express, Microsoft Visual FoxPro,
Mimer SQL, Mnesia , Analytic
Column MonetDB, MongoDB, Morantex, mSQL, MySQL, Neo4J, NEO,
Streams Temporal
Hadoop
Store NonStop SQL, Objectivity, Openbase, OpenInsight, OpenLink HBASE
Netezza, & Virtuoso,
DBMS DBMS DBMS
OpenLink Virtuoso, OpenLink Virtuoso Universal Server, OpenQM, Oracle,(MPP) Rdb
DBMS
Oracle
for OpenVMS, OrientDB, Panorama, Perst, PervasiveSQL, PicoLisp, Pincaster,
PostgreSQL, Prevayler, Progress Software, Qizx, Queplix, RaptorDB, RavenDB, RDM
Embedded, RDM Server, Recutils, Redis, Riak, SAND CDBMS, Sav Zigzag, Scalaris,
Scalien, SciDB, ScimoreDB, Sedna, SisoDB, SmallSQL, solidDB, Sones, SQLBase,
Hyper-
Graph Algebraic Cloud Triple
SQLDB, SQLite, Starcounter, Sterling, Stratosphere, STSdb, Sybase, Sybase IQ,
media
DBMS DBMS DBMS Stores
tdbengine, Teradata, Terrastore, The SAS system, ThruDB, TimesTen, Tokutek , Trinity,
DBMS
txtSQL, U2, UniData, UniVerse, Valentina, Versant, VertexDB , Vertica, VistaDB, VMDS,
Voldemort, WCE SL Plus, XSPRADA, Yserial, ZODB, Zoduna


RDBMS & SQL As Anachronisms
For big BI, RDBMS has been
superseded by column store dbms
primarily because it didn’t scale out
and indexes have become far less
important.
The use of snowﬂake schemas and
star schemas had already
demonstrated that 3NF was a limited
modeling technique and nothing
more.
And then came Hadoop & MapReduce
for massive scale-out - which cares
nothing for SQL or RDBMS


A Fundamental Error
Actions: Add, Modify, Delete,
Archive
From day 1 there was a fundamental
error in the simple mechanics of
database and ﬁle systems.
When you update data you destroy
the old value. No audit trail.
A correct theory of data was
invented by (perhaps) Luca Pacioli.
It is the basis of accounting.
A few databases (Firebird is one)
were built so that data was only ever
added or archived.


The Ordering Of Data
“A data set is an unordered
collection of unique, non-duplicated
items.”
This is an absurd constraint to place
upon data, as data is naturally
ordered by time if by nothing else.
Events are ordered by time.
Changes to entities are ordered
by time
There are lots of applications.
requiring time series capability.
This has led to TSDB products like
Streambase, Vhayu, Open TSDB,
etc.


The Separation of Data and Process
The assumption was that this
separation could be enforced
But when you try to enforce it, you Process
forever encounter data and process
locked together in a guilty embrace.
It is a wrong separation of concerns.
SQL SCHEMA
In truth it cannot be enforced without
there being a true algebra of data
So many databases (object
databases and other NoSQL
databases) do not enforce it. DBMS

However their interfaces to data are
not perfect either.


Relational Algebra Isn’t An Algebra
Set aside that fact that RDBMS
focus so strongly on Table structures
that they cannot naturally represent
other important data structures
(such as BOMP and MOLAP).
And that RDBMS rail against the
ordering of data (“No order”)
Ignore the stored procedures (which
violate the separation of data and
process).
Even so Relational Algebra is not
even an algebra. (NULLs?)
There is at least one algebraic
(NoSQL) database


The SQL Barrier
SQL has:
DDL (for data deﬁnition) SQL
Barrier
DML (for Select, Project and Join)
Results Or results
But it has no MML or TML processing
must be done here
processing
must be done here

Usually result sets are brought to the
client for further manipulation, but
using them for further data access
SQL
becomes problematic.
Conclusions: Analytic
DBMS

This separation of data from
process is arbitrary and unhelpful
Any database to which this
doesn’t apply is NoSQL


Other NDBMS Directions
Some NDBMS do not attempt to provide all ACID
properties. (Atomicity, Consistency, Isolation, Durability)
Some NDBMS deploy a distributed scale-out
architecture with data redundancy.
XML DBMS using XQuery are NDBMS.
Some documents stores are NDBMS (OrientDB,
Terrastore, etc.)
Object databases are NDBMS (Gemstone, Objectivity,
ObjectStore, etc.)
Key value stores = schema-less stores (Cassandra,
MongoDB, Berkeley DB, etc.)
Graph DBMS (DEX, OrientDB, etc.) are NDMBS
Large data pools (BigTable, Hbase, Mnesia, etc.) are
NDBMS


What To Do...


What Is The Problem You Are
Trying To Solve?
The primary message of this presentation is that
database is no longer a commodity (if it ever
was).
Despite faults and weaknesses the General
Purpose Relations Database works ﬁne for many
areas of application and:
It is well understood
Skills (for any popular product) are abundant
It can be inexpensive (by license or Open
Source)
Beyond such products, it is “horses for courses”
and “caveat emptor.”


Other Selection Criteria
Don’t fall for fashion.
Proven performance?
Skills, both for design and for administration.
Interfaces & middleware
The hardware bill.
Product roadmap.
External support/internal support.
Calculate a TCO (note that even for expensive
DBMS the licenses fees are rarely more than
15% of the TCO)


Take Aways
Hardware trends have brought change,
will bring more change
There are many RDBMS weaknesses
There are a huge number of “new”
database products both
No SQL Whatsoever, and
Not Only SQL
Select database products with caution
Main Take Away:

Database is no longer a commodity


Thank You
For Your
Attention


The Coming Database Revolution

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to The Coming Database Revolution

Similar to The Coming Database Revolution (20)

More from DATAVERSITY

More from DATAVERSITY (20)

Recently uploaded

Recently uploaded (20)

The Coming Database Revolution