NoSQL-Database-Concepts

WWW.OSTUSA.COM
DATABASES FOR BIG DATA
EVOLUTION OF NoSQL DATABASES and CONCEPTS
Bhaskar Gunda,
Open Systems Tecnologies

Agenda
• About Me
• Introduction to DBMS – History and Evolution
• RDBMS concepts
• Overview of Big Data
• Boundaries of RDBMS—Need for DBMS beyond RDBMS
• Paradigm Shift in DBMS
• NoSQL Databases – Definition, Advantages and breaking boundaries
• Types of NoSQL Databases and their Usage
• Future of RDBMS

About Me
• Bhaskar Gunda – Working as Principal Consultant at Open Systems Technologies
• Has 28 years of IT experience
• I am an Electrical Engineer with MBA
• Started working with Computers while in college building Microprocessor based
systems such as Logic controllers on Intel 8085 and Z-80 systems using Assembly
language.
• Started Career with Databases –
– First ever database that I worked was – dBase III & dBase IV.
– First Commercial database to workd was Sybase .
– But immediately transitioned into Oracle –
• was trained in 4.0, but started using 5.0 onwards.
• Still continuing to work with Oracle and many other databases – SQL Server, Informix, PostgreSQL, MySQL
• Started working NoSQL DBs couple of years back.
• I specialize in building HA and DR systems, End-to-End Infrastructure design,
implementations, migrations.

About Today’s Presentation
• NoSQL databases are gaining momentum
• But there is some confusion over their concepts and different types of NoSQL
Databases.
• Originally I thought of only focusing on NoSQL Concepts in this presentation.
• But in keeping broader audience in mind, I have included some Database 101
Concepts also in this presentation.
• I tried my best to put everything together in a format that flows logically.
• As this is not an interactive presentation, I welcome your feedback and any
questions through email.
• I will do my best to answer your questions through email.
• My contact info is provided at the end of the presentation.

Data and Information
• Data can be defined as Discrete elements describing a person, thing or an activity.
• Information is putting this Data together to form a meaningful Inference –
– Querying What is there – simple way of displaying the data – may be a spreadsheet format or a tabular
format
– Visualization of data in a format that can be understood easily – dashboards, graphs, charts etc
– Making some meaningful analysis – historical analysis, Incident Analysis, Post-mortem Analysis, Predictive
Analysis..
Often times Data and Information are used interchangeably, which is not correct.
– Data is discrete element and Information is a simple or complex compound of these elements.
– Data is generated, sourced, gathered, acquired on its own
– Information is generated from Data
• Database Management System (DBMS) --
– Database is a location where the data is stored in certain format
– DBMS is a collection of programs that allows users to specify the structure of database, create, query and
modify the data in the database and control access to it.

Data and Information
• A simple and easy way to understand is to use a Lego Analogy.
– Data is like Lego blocks.
– Information is putting these Lego Blocks together to form a thing.
– And a person who puts everything together is a Data Scientist

POWER OF DATA
• Old Saying
– PEN is MIGHTIER than SWORD.
• Modern Saying is
– DATA is MIGHTIER than PEN and a SWORD.
• Companies like Yahoo, Google, Facebook, Twitter, LinkedIn and many others are
based on Using Data in a meaningful way – doing business with Data and
Information. They have completely changed the relationships among people, how
they communicate and how they interact with each other. Because of this a term
has been coined in – Social Networks.
• Companies like Amazon, Alibaba (largest e-commerce portals) are successful
because of mining of data to understand the consumer behavior.

History of DBMS and Evolution
• Databases have a long history and evolved different models from early 1960’s
until now.
– Minimal or no-format Databases (No Frills) – These databases were like writing a transaction on a
paper except was stored in Computers – pre 1960’s.
– Hierarchical Database Models – early 1960’s -- Data is stored into different Units with
Hierarchical relationships
– Network Database Model – Late 1960’s – Multiple relationships were created with transactions.
– Relational Database Management Systems (RDBMS) -- Early 1970’s – Uses Entity-Relationship
model based on E.F.Codd’s 12 Principles
– NoSQL Database – 2009. Deviates away from Relational Model and introduces new method of
storing the data

Relational Database Management System (RDBMS)
• Most Popular Database System
• Developed by E.F.Codd in early 1970’s.
• The database is based on 12 Principles developed by E.F.Codd
• This is based on Entity and Relationships.
• The data is arranged in Databases consisting of Tables – in Row & Column format.
• Data storage is optimized with Normalization.
• Data in tables are bound by relationships called Constraints – which enforces the
integrity of data across the database.
• The tables are arranged in Schema format with access controls.
• RDBMS is ACID Complaint.

ACID - Defined
• ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties that guarantee that database transactions are
processed reliably.
• Atomicity -- Atomicity requires that each transaction be "all or nothing": if one part of the transaction fails, the entire
transaction fails, and the database state is left unchanged. An atomic system must guarantee atomicity in each and
every situation, including power failures, errors, and crashes. To the outside world, a committed transaction appears
(by its effects on the database) to be indivisible ("atomic"), and an aborted transaction does not happen.
• Consistency -- Consistency property ensures that any transaction will bring the database from one valid state to
another. Any data written to the database must be valid according to all defined rules, including constraints,
cascades, triggers, and any combination thereof. This does not guarantee correctness of the transaction in all ways
the application programmer might have wanted (that is the responsibility of application-level code) but merely that
any programming errors cannot result in the violation of any defined rules.
• Isolation -- Isolation property ensures that the concurrent execution of transactions results in a system state that
would be obtained if transactions were executed serially, i.e., one after the other. Providing isolation is the main goal
of concurrency control. Depending on concurrency control method (i.e. if it uses strict - as opposed to relaxed -
serializability), the effects of an incomplete transaction might not even be visible to another transaction.
• Durability -- Durability property ensures that once a transaction has been committed, it will remain so, even in the
event of power loss, crashes, or errors. In a relational database, for instance, once a group of SQL statements execute,
the results need to be stored permanently (even if the database crashes immediately thereafter). To defend against
power loss, transactions (or their effects) must be recorded in a non-volatile memory.

Structured Query Language (SQL)
• Special Purpose Programming Language designed for managing data in RDBMS
• Developed by IBM in 1970’s.
• SQL is 4th Generation Language.
• SQL is based on relational algebra and tuple related Calculus.
• It consists of DML, DCL and DDL.
• RDBMS and SQL are closely tied to each other.
Title

DBMS ARCHITECTURE
Title
PHYSICAL LAYER
(Represents how data is stored on the Storage
Devices)
LOGICAL LAYER
(Represents how data is accessed by the users –
Schema, Tables)
VIEW VIEW VIEW
Represents How
Data has been
portrayed- Using
Interface Languages
such as SQL

RDBMS Concept
Unique
Values
001,1,Doe,John,3000;
002,2,Smith,Jane,3500;
003,3,Taylor,John,2800;
004,4,Smith,Mike,2500;
005,5,Doak,Richard,4000;
006,6,Brown,Dan,3500
Row Format Storage
ID
1
2
6
5
4
3
Last
Doe
Smith
Brown
Doak
Smith
Taylor
First
John
Jane
Dan
Richard
Mike
John
Bonus
3000
3500
3500
4000
2500
2800
Possible duplicate contents
Unique
ROWID
001
002
006
004
003
005

RDBMS Advantages
• Very popular and almost all the ERPs and many mainstream applications are run
on RDBMS.
• Integrity and consistency of data and simple representation of data layout – tables
& constraints in a schema level
• Physical independence – Users are not worried about physical layer, but only
interact with Logical layer.
• Logical Independence – makes database portable across physical layers and
applications and users are not impacted for most of the times
• Support for SQL
• Better backup and restore capabilities
Title

RDBMS Disadvantages
• Expensive and complex Software
• Expensive Hardware
• Highly Skilled resources are required for setting up and managing.
• Difficult to recover data if lost
• Horizontal scalability is limited
• Only Vertically scalable
• Very difficult to utilize many complex data types
• Does not completely represent real world conditions
• Data processing becomes slow as the size increases or some times even simpler
data sizes also due to changing data handling algorithms.
• Very limited support for 3 GLs and hence Procedural handling of Data is not easy.
Title

EXPLOSION OF DATA
• With advent of Social networks, increases utilization of Computers and wide
spread use of Internet, the data in the world is growing at tremendous pace.
• Oracle has done a study to estimate the data growth and current data content in
the world from all the sources and found the following
– Data is growing at very faster pace – at an annually compounded rate of 40%.
– It is almost doubling every year or may be even more in next few years.
– At the current rate of growth it will reach about 45 Zetabytes (ZB) by 2020
(1 zettabyte = 1021 bytes or 1 trillion GB)
– Amount of Data that exists today is 2 times of what it was 2 years back.
• Due to increase in the data sources such as Social Networks, Internet of things
(IoT), Healthcare – different data types are being generated
• All the above factors have started to limit the use of RDBMS
Title

BIG Data Challenges and RDBMS Limitations
BIG DATA CHALLENGE RDBMS Limitation
High Velocity – Data is generated at a very high speed and required to
be ingested
It is not easy to configure RDBMS for high rate of data Ingestion.
Requires many resources and hence high cost software/hardware
High Variance – Data generated is of different data types – no
particular format or data type can be defined for certain data sources
– such as Social networks – structured, semi-structured & un-
structured
RDBMS has only certain data types. Others have to be defined, but
defining and maintaining to meet current requirements is very
expensive and still does not blend in properly.
High Volume – Data often generated is in high volume RDBMS creates a limitation in ingesting large amounts of data. To
enable more resources and more licenses and more costs
High Veracity – Uncertainty and Uncleansed data. RDBMS has to be designed to handle peak loads even if it is not
always the case and prior cleansing is required – which makes it
difficult to handle and prohibits the cost
Continuous Data and Availability RDBMS requires huge amount of investment to achieve very high HA
and DR capabilities and still not 100% RTO and RPO are met.
Location Independence -- ability to read and write to a database
regardless of where that I/O operation physically occurs and to have
any write functionality propagated out from that location, so that it’s
available to users and machines at other sites.
RDBMS hits the limit of this functionality. We cannot have multiple
nodes writing to multiple places and still have data concurrency.
Oracle RAC provides distributed computing, but not distributed
copies of database at the same time.
Flexible Data Models – not tied into any principles or schema RDBMS hits the wall if any of its principles are deviated or cannot
create schema less, dependency less model
Faster Analytics and Business Intelligence RDBMS again hits the limit with performance and scalability when it
comes to Real-Time analytics and Business Intelligence.

Paradigm Shift in Database Management
Title
• Organizations are increasingly conceding the fact that the exploitation of its big
data is a major factor in competitiveness in the next decade.
• We are trying to solve Today’s problems with Yesterday’s solutions.
• For everything and anything RDBMS is not the solution.
• Big Data Analytics does not need RDBMS methodology. To certain extent ACID
can be either compromised or taken care of at the source and hence do not
additionally be enforced in the Database.
• Highly Scalable, low cost solution – should be the option and hence RDBMS
cannot be used. RDBMS is a proprietary system with huge Software Cost.
• SQL is not always the Method to Extract Data – RDBMS and SQL are inseparable.
• Most organizations have started to cross of chasm of RDBMS to NoSQL
databases.

NoSQL Databases
• NoSQL Database is a buzz word in modern database technology world
• NoSQL is a word coined by Carlo Strozzi in 1998 to name his lightweight, Strozzi NoSQL
open-source relational database that did not expose the standard SQL interface, but
was still relational.
• NoSQL DB now has changed its original meaning OR rather added more to the original
concept of Carlo Strozzi of using just SQL to interact with database.
• Decoupling SQL from RDBMS means changing the RDBMS methodology is today’s
concept.
• And hence NoSQL Database means “Not Only SQL” database. Or in other words using a
concept beyond RDBMS.
• NoSQL databases are some times called – “Non RELATIONAL”, “Non SQL” – but in my
opinion it is not completely True – It is just beyond usage of SQL only – means shift in
the way Data is stored and Managed – another new Breed of DBMS – NoSQL
Title

Birth of NoSQL
• Johan Oskarsson of Last.fm reintroduced the term NoSQL in early 2009 when he
organized an event to discuss "open source distributed, non relational databases".
• Concept of Hadoop and Open Source have opened the doors to World of
Innovation in Database Management Systems to look beyond RDBMS.
• One of Early NoSQL Database Entry was– Google BigTable
• The key in developing the concept of NoSQL database was – Distributed
Processing, Horizontal Scalability, Use of Cheap and Commodity Hardware, Speed
of Analytics using 3GL and other languages and not just 4GL - SQL.
Title

Benefits of NoSQL Database
• NoSQL databases have different models and are purpose built.
• Compared to RDBMS NoSQL databases are more scalable and Provide superior
performance
• Large Volumes of Rapidly changing, semi-structured and unstructured data can
easily be handled
• Helps in Agile sprints, quick schema iteration and frequent code pushes
• Object oriented programming that is easy to use and flexible
• Geographically distributed scale-out architecture.
• All the challenges described for Big Data are addressed with NoSQL database.
Title

NoSQL Database Concepts
• Open Source
• Schemaless
• Scalability with Scale Out with Commodity Class Hardware
• Distribution and Sharding – Parallel Query with Engines such as MapReduce &
Spark, Distributed Caches
• Data ingestion and extraction using multiple methods.
• Eventual Consistency
• High Availability
Title

NoSQL Concepts – Open Source
• Typically most of the NoSQL databases are open source – Hbase, CouchDB
• There are many vendors today offering commercial Databases with support –
MongoDB, Vertica, Couchbase Server
• Some of the vendors have built the offering on top of Open Source Like Splice
machine which is built on Hbase and Derby.
• Almost all of these databases are integrated with many Open Source tools.
• They layer on top of some the Big Data environments or utilize the tools and
concepts already in place for Big Data Eco system.
• Does not require SQL engine – however, many of the vendors have developed
products that are more of SQL type which translates into built-in distribution
processes
Title

NoSQL Concepts – Schemaless
• This is something very hard to conceptualize coming from RDBMS world.
• NoSQL solutions do not require, or accept, a pre-planned data model whereby every
record has the same fields and each field of a table has to be accounted for in each
record
• They support a flexible data model. Though there can be strong similarities from record
to record, there is no “carry-over” from one record to the next.
• Each field is encoded with JavaScript Object Notation (JSON) or Extensible Markup
Language (XML) according to the solution’s architecture.
• The result is that developers have the agility they need to meet evolving business
requirements.
• Because of this model data can be dumped without Transformation. Transformation of
data occurs while Extracting the data – ELT Vs ETL in RDBMS. This is very much useful
in building Data Warehouse systems.
• Schema is built on Query
Title

NoSQL Database Architecture
PHYSICAL LAYER
(Represents how data is stored on the Storage
Devices)
LOGICAL LAYER
(Represents how data is accessed by the users –
Schema)
VIEW VIEW VIEW
Represents How Data has been portrayed- Using Interface Languages such as SQL, Python or Tools
like Tableau or Qlik
View & Logical Layers
are merged.
Logical Layer becomes
part of Data
Visualization OR in
other words
a Schema is built upon
Query

NoSQL Concepts – Scalability with Scale Out
• NoSQL databases are Scalable with Scale Out model.
• NoSQL solutions support a scale out model for growth by dividing the
programming across a single data set spread over many machines.
• While relational databases are engineered to scale up by adding additional
resources to the server, NoSQL databases are engineered to scale by adding
additional servers or nodes. – Distributed Processing Model
• This is the concept taken from Hadoop. But NoSQL databases do not necessarily
require Hadoop infrastructure in background.
• NoSQL databases like Hadoop can run on Commodity Class hardware and does
not require any high end Infrastructure as RDBMS.
• There is no limit to the amount of servers that NoSQL databases can run on.
Title

NoSQL Concepts – Distribution with Sharding
• These databases are Engineered to run on Multiple Installations of servers.
• NoSQL solutions utilize a partitioning pattern known as SHARDING– that places
each partition in potentially separate servers that are potentially physically
disparate.
• The result is that each server is responsible for operating its data instead of all of
the data.
• This helps in Scalability with Scale out as discussed.
• This model helps in running Parallel Query Operations using Big Data Engines
such as MapReduce or Spark.
• Sharding is implemented using Distributed Cache Model.

Distributed Processing between RDBMS & NoSQL
Title
Distributed Processing in RDBMS Distributed Processing in NoSQL DB
1. Single Copy of database
2. Possible Block level contention.
3. If same block is accessed, then the entire record or
page will be locked.
1. Multiple copies of Database.
2. Blocks are distributed across machines and hence will not lock
each other.
3. Only block level is locked – so entire record is not locked.
4. Added benefit is Higher availability

NoSQL Concepts – Data Ingestion and Extraction
• Most of the NoSQL databases support many Data ingestion tools in Big Data Eco
system such as Flume, SQOOP, Spark Streaming
• Data is extracted using many methods – not necessarily SQL. However, some
mainstream vendors have built their own implementations of SQL for jump
starting the process, actual power is utilizing Low level programming languages
such as Java, Python, Scala, R etc.
• If SQL method is used – then in the background the SQL Jobs are split into
multiple processes spread across different nodes much like MapReduce or Spark.
Or some of the databases are built on top of MapReduce or Spark and hence are
submitted as MapReduce or Spark Jobs.
• Data visualization Tools such as Tableau or Qlik support most of the NoSQL DBs.
Title

NoSQL Database Concepts – Eventual Consistency
• This is another concept very hard to visualize.
• In RDBMS world we are used to have Data consistency based on ACID.
• But Some NoSQL solutions still do not have strong consistency like a single
machine system does.
• Each record will be consistent, but transactions are usually guaranteed to be
“eventually consistent” which means changes to data could be staggered for a
short period of time due to a lower latency in the write operation.
• Sometimes CONSISTENCY can be compromised depending upon the application
that is using this database – for example Predictive Analytics or running What If
scenarios.
Title

NoSQL Database Concepts – High Availability
• By virtue of the Design High Availability is built into NoSQL databases.
• There is no extra effort or software is required for this purpose.
• Data is distributed across multiple nodes with multiple copies much like Hadoop
infrastructure.
• Failure of any node in the cluster will not affect the data loss or processing failure.
• Once the failed hardware is replaced or brought online, the data on that node is
automatically synchronized from the changed blocks on the other nodes.
Title

NoSQL DBMS Applications
• With some of the questions about ACID compliance, schema less options, support
for SQL etc, questions may arise where exactly the NoSQL Database can be
utilized.
• What type of applications are supported on NoSQL Database.
• NoSQL databases are mostly deployed for ad-hoc query purposes. These
databases are not deployed for OLTP purposes. (Even though some of the
vendors are coming out with ACID compliance and OLTP support, but largely they
are not used for OLTP).
• Primary applications – Data Warehouse, BI, Predictive Analytics, Big Data
applications.
• Data Warehouse and BI applications benefit most with NoSQL DBs as it reduces
cost of hardware, software, increased the processing output; Best of all using ELT
and not ETL.
Title

NoSQL Database Types
• All NoSQL Databases are not designed similarly
• They are different types of NoSQL Databases based on the design on how they
store data.
• Types of NoSQL Databases are –
– Columnar Databases stores
– Key-Value Database stores
– Document Database stores
– Graphical Database stores
– Multi-model Database stores
Title

COLUMNAR DATABASE Store
• Most popular model of database is Columnar Database model as this model is closer to RDBMS.
• It is a DBMS that stores data tables as sections of columns of data rather than as rows of data
(unlike RBMS where data is stored in rows). Explained in the next slide.
• Data is compressed by eliminating the duplicate data in the columns. On top of it, one of the most
popular compression models – LZW (Lempel-Ziv-Welch) algorithm, Run-length encoding.
• Compression is further enhanced by sorting the data in the columns.
• Some of the most popular databases of this model are –
– HP Vertica, Hbase, Cassandra, Accumulo, BigTable, Splice Machine
• SAP HANA is one of the popular columnar database store – but it is designed to support only SAP
application and very expensive. SAP has announced entire ERP (OLTP & Batch processing) -- SAP
S6 to be supported on HANA beginning of last year-2015.
• Most Common utilization of this model is – Clinical Data processing, Data Warehouse & BI, Library
card catalogs, ad-hoc query requirements requiring large amounts of small set of columns is
aggregated.
Title

Column Format Storage
ID
1
2
6
5
4
3
Last
Doe
Smith
Brown
Doak
Smith
Taylor
First
John
Jane
Dan
Richard
Mike
John
Bonus
3000
3500
3500
4000
2500
2800
Unique
Values
Possible duplicate contents
Unique
1,2,3,4,5;
Doe,Smith,Taylor,Smith,Doak,Brown;
John,Jane,John,Mike,Richard,Dan;
3000,3500,2800,2500,4000,3500
1:001;2:002;3:003;4:004;5:005;
Doe:001;Smith:002,004;Taylor:003;Doak:005;Brown:006;
John:001,003;Jane:002;,Mike:004;Richard:005;Dan:006;
3000:001;3500:002,006;2800:003;2500:004;4000:005;
ROWID
001
002
006
004
003
005

RDBMS Vs Columnar stores
Title
• 001,1,Doe,John,3000;
• 002,2,Smith,Jane,3500;
• 003,3,Taylor,John,2800;
• 004,4,Smith,Mike,2500;
• 005,5,Doak,Richard,4000;
• 006,6,Brown,Dan,3500
1:001;2:002;3:003;4:004;5:005;
Doe:001;Smith:002,004;Taylor:003;Doak:005;Brown:006;
John:001,003;Jane:002;,Mike:004;Richard:005;Dan:006;
3000:001;3500:002,006;2800:003;2500:004;4000:005;
RDBMS or ROW format storage Columnar format storage
ID
1
2
6
5
4
3
Last
Doe
Smith
Brown
Doak
Smith
Taylor
First
John
Jane
Dan
Richard
Mike
John
Bonus
3000
3500
3500
4000
2500
2800
ROWID
001
002
006
004
003
005

Pros and Cons of Columnar Database
• Pros –
– This is very much useful and efficient when an aggregate needs to be computed over many rows
but only for smaller subset of data.
– This is efficient when new values of a column are supplied for all rows at once.
– High compression helps in reduced storage requirements and reduced Disk Reads
• Cons –
– If many columns of a single row or multiple rows have to queried or fetched then this may be less
efficient – but still it outperforms RDBMS.
– If entire row has to be updated or replaced then it will take some time to perform the operation.
Title

Key-Value Database Store
• This is a method for storing, retrieving and managing Arrays of data where
Metadata is defined for each value in the array.
• This store consists of collection of Objects or Records of similar type but has
different fields.
• Each record may differ from others.
• It is different than RDBMS – where each record has pre-defined model of key-
values.
• Document based and Graphical based models are derived from this model.
• This follows more closely with modern concepts like Object Oriented
Programming (OOP).
• Most popular databases in this format are –
– REDIS, Oracle NoSQL DB, Berkley DB, DynamoDB
Title

Key-Value Database Store -- Storage
• An XML format (or JSON format) as follows represent the data storage in Key-
Value store
<contact>
<firstname>Bhaskar</firstname>
<lastname>Gunda</lastname>
<street1>605 Seward Ave. NW</street1>
<city>Grand Rapids</city>
<state>MI</state>
<zip>49504</zip>
<country>USA</country>
</contact>
– This record is of type – Contact/Address.
– Each field has metadata (key) defining the value.
Title

Document Database Store
• This is another popular method of storing the data. In fact adoption of NoSQL has
increased because of this model.
• This is designed for storing, retrieving and managing document-oriented data – semi-
structured data.
• This model is a subset of Key-value store but differs from it by not having the keys pre-
defined.
• Metadata is generated for each document separately.
• The data stored in a free-from.
• This differs from RDBMS where a fixed record structure is created for acquiring and
storing the data.
• Programmers create intelligence in parsing the data.
• Each document is a record of its own and every record may differ from others. Each
record is of same type but not necessarily have same number of fields.
Title

Document Data Store – Contd.
• Each document is retrieved using a Unique key – usually a URI.
• Database retains index on the Keys to speed up the retrieval process.
• This makes this database to be popular in Web applications.
• A free form of data store, automatic suggestions of data are the primary applications of
this data store.
• For retrieval purpose admin adds hints to the databae to look for certain type of
information.
• Any document data containing metadata – such as JSON, XML can be used to store the
data in this store.
• Most popular databases are –
– Couchbase Server
– CouchDB
– MongoDB
– Elasticsearch
Title

Document Data Store – Storage
Title
Bhaskar Gunda
OST,
605 Seward Ave NW,
Grand Rapids, MI 49504
Bhaskar Gunda
605 Seward Ave,
Bhaskar Gunda
OST,
PO.Box. 456
605 Seward Ave NW,
• Each of the above boxes represent One Document
• All three boxes are of same type – Address type document
• But they differ in the content and number of fields.
• Each of these documents are stored with Unique values and the metadata is generated for
each document.
• Programmer writes hints such as “find all my <contact>s with a <zip code>”
This document does not
contain Company Name
than the first document
This document contains
additional PO Box field
than the first document

Document Database Store – Applications
• This type of data store is more popular in Web applications.
• Largely used for semi-structured data.
• Implementations offer a variety of ways of organizing documents, including
notions of:
– Collections
– Tags
– Non-visible Metadata
– Directory hierarchies
– Buckets
Title

GRAPHICAL DATABASE STORE
• This model utilizes a Graph compute model consisting of Nodes & Relationships.
– Each Node is an Entity – a person, place, thing or an activity
– Each Relationship is how Two Nodes are connected to each other.
• Graph Database Model is a DBMS system with storing, retrieving and manipulating data
working in a Graph data model.
• Relationships take first priority in this model – applications doesn’t have to infer data
connections using foreign keys. This is the difference between RDBMS and this model.
• This is simpler and more Expressive than other models.
• This model is more useful in Social networks traversing relationships.
• Graphical databases can be OLTP databases and are fully ACID complaint.
• Some Graphical Databases implement Key-Value store internally for building the
relationships (pointers) between records.
• Most popular databases are– Neo4j, Giraph
Title

Graphical Database Store - Storage
Title

Multi-model Database Store
• Each of the databases (columnar, key-value, document, graphical) are organized in a
single database model that determines how data is stores, retrieved and manipulated.
• If an Organization has need for two different applications which are optimized by one
data model for each, then they have to have two different Models implemented for
each type of application (called Polyglot Persistence)– which defeats the purpose of
using NoSQL Database.
• This is resolved by combining two different models.
• This offers a great advantage of polyglot persistence.
• This model is also ACID compliant.
• One of the first and mostly used database is – OrientDB (supporting Graph, document,
key-value & object Models).
• Other popular database is – Couchbase server.
Title

Selecting a NoSQL Database
Title
• Selecting which model of database is suitable largely depends upon the intended
Business use of the data.
• Key Factors to be considered are -
• Model of the database store as required by Business need.
• Scalability
• ACID Compliance required
• Sharding Capability
• Ability to utilize In-Memory transactions or Not
• Data Ingestion, extraction and Visualization support
• Support for Hadoop Eco system
• Cost to support

NoSQL Database Challenges
Title
• NoSQL databases are mostly used for ad-hoc queries, predictive analytics and
recently increasing the use in DW and BI applications. It is not intended for OLTP or
support mainstream applications such as ERPs.
• Security is one of the concerns in these models. However, Vendor provided NoSQL
database are implementing to certain extent some rigid Security models.
• Selecting a right model to suit the business need requires an in-depth analysis and
understanding of each of the models – this requires a highly skilled resource
(usually outside resource) to identify the right type.
• Risk in selection can be mitigated by conducting a POC upon short listing the
models selected. Usually cloud can be used for this purpose.

Future of RDBMS
• With all this discussion we may feel that RDBMS is going to die.
• Is it real that RDBMS is going to die?
– Not in Reality. RDBMS enforces certain requirements such as ACID compliance, General Model, matured
state of data storage which are all required for the mainstream applications.
– Many applications – ERPs and Transactional systems are designed for RDBMS.
– For all OLTP – RDBMS becomes a choice of database.
• In reality RDBMS and NoSQL databases will co-exist for many years to come in any
organization. But some NoSQL databases are also closing the gap between RDBMS and
NoSQL and making NoSQL database to be RDBMS as well.
• It will be very expensive preposition for any organization to replace RDBMS for their
business operations.
• However, it becomes easier, cheaper and most beneficial if they can replace RDBMS
with NoSQL Databases for applications like Data Warehouse, BI or any new Analytics
platform.

References
• To make this presentation more concise and precise, some of the information is
taken from other presentations. I could not find the references to authors of those
presentations. However, I would like to thank them for making the material
available for reference.
Title

My Info
Bhaskar Gunda
Principal Consultant,
OST
Phone: 616-574-3504
Email: bgunda@ostusa.com
Title

NoSQL-Database-Concepts

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to NoSQL-Database-Concepts

Similar to NoSQL-Database-Concepts (20)

NoSQL-Database-Concepts