Reference Guide

Table Of Contents
Figure 36 Cassandra Row
This is a simple row with columns. There are other variants like Composite Columns and Super
Columns which allow more levels of nesting. These can be visited if there is a need for these in the
design.
One important characteristic of Cassandra is that it is schema-optional. This means the columns
need not be defined upfront. They can be added dynamically as and when required and further
all rows need not have the same number and type of columns.
Some important points to be noted during migration of data from RDBMS to NOSQL are as
follows:
Model data with nested sorted maps in mind as mentioned above. This provides an efficient
and faster response time for queries.
Model Column families around queries.
De-normalize data as needed. Too much of de-normalization can have side effects. A right
balance needs to be struck.
Modeling Data around Queries
Unlike with relational systems, where entities and relationships are modeled and then indexes are
added to support whatever queries become necessary, with Cassandra queries that need to be
supported efficiently are thought of ahead of time.
Cassandra does not support joins at the query time because of its high scale distributed nature.
This mandates duplication and de-normalization of data. Every column family in a Cassandra
keyspace is self-contained with all data necessary to satisfy a given query. Thus, moving towards a
“Column Family per query” model.
In the HP VAN SDN Controller, define a column family for every entity. For each query on that
entity, define a secondary column family. These secondary column families serve exactly one
query.
Reference Application using Distributed Persistence
Any application that needs to use the distributed persistence in the HP VAN SDN Controller needs
to include/define the following components:
A Business Logic component as an OSGi service.
A reference to Distributed DataStoreService and Distributed QueryService
A DTO (transport object) per entity.
88